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Abstract 

We propose some axioms for hierarchical clustering of probability measures and investi¬ 
gate their ramifications. The basic idea is to let the user stipulate the clusters for some 
elementary measures. This is done without the need of any notion of metric, similarity 
or dissimilarity. Our main results then show that for each suitable choice of user-defined 
clustering on elementary measures we obtain a unique notion of clustering on a large set 
of distributions satisfying a set of additivity and continuity axioms. We illustrate the 
developed theory by numerous examples including some with and some without a density. 
Keywords: axiomatic clustering, hierarchical clustering, infinite samples clustering, den¬ 
sity level set clustering, mixed Hausdorff-dimensions 


1. Introduction 


Clustering is one of the most basic tools to investigate unsupervised data: finding groups 
in data. Its applications reach from categorization of news articles over medical imaging 
to crime analysis. For this reaso n, a wealth of al gorithms h ave been pr o posed, among 
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and the references therein. 

However, each ansatz has its own implicit or explicit definition of w hat clustering is. 
Indeed for A:-means it is a particular Voronoi partition, for iHartiganl (Il97,5l . Section 11.13) it 
is the collection of connected components of a density level set, and fo r generative models it 
is the decomposition of mixed measur es into the part s. IStuetzlel (j2003l ) stipulates a grouping 
around the modes of a density, while IChaconI (120141 ) uses gradient-flows. Thus, there is no 
universally accepted definition. 

A good notion of clustering certainly needs to address the inherent random variability in 
data. This can be achiev ed by notions of clusterings for infin ite sample regimes or complete 
knowledge scenarios—as Ivon Luxburg and Ben-DavidI (|2005l ) put it. Such an approach has 
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various advantages: one can talk about ground-truth, can compare alternative clustering al¬ 
gorithms (empirically, theoretically, or in a combination of both by using artificial data), and 
can define and establish consistency and learning rates. Defining clusters as the connected 
components of density level sets satisfies all of these requirements. Yet it seems to be slightly 
ad-hoc and it will always be debatable, whether thin bridges should connect components, 
and whether close components should really be separ ated. Sim i lar co ncer ns may be raise d 
for other infinite sample notions of clusterings such as Stuetzle ( 2001ll ) and Chacon ( 2014 ). 


In this work we address these and other issues by asking ourselves: What does the set 
of clustering functions look like? What can defining properties—or axioms—of clustering 
functions be and what are their ramifications? Given such defining properties, are there 
functions fulfilling these? How many are there? Can a fruitful theory be developed? And 
finally, for which distributions do we obtain a clustering and for which not? 

These questions have led us to an axiomatic approach. The basic idea is to let the 
user stipulate the clusters for some elementary measures. Here, his choice does not need 
to rely on a metric or another pointwise notion of similarity though—only basic shapes for 
geometry and a separation relation have to be specified. Our main results then show that 
for each suitable choice we obtain a unique notion of clustering satisfying a set of additivity 
and continuity axioms on a large set of measures. These will be motivated in Section 11.21 
and are defined in Axioms mm and [3] The major technical achievement of this work is 
Theorem [20l it establishes criteria (c.f. Definition fTHI) to ensure a unique limit structure, 
which in turn makes it possible to define a unique additive and continuous clustering in 
Theorem [2TJ Furthermore in Section 13.51 we explain how this framework is linked to density 
based clustering, and in the examples of Section 14.31 we investigate the consequences in the 
setting of mixed Hausdorff dimensions. 


1.1 Related Work 


Some axioms for clustering have been propos ed and investigated, but to our knowledge, all 


approaches concern clustering of finite data. I.Tardine and SibsonI (Il97lh were probably the 


first to consider axioms for hierarchical clusterings: these are maps of sets of dissimilarity 
matrices to sets of e.g. ultrametric matrices. Given such sets they ob tain continuity and 


uniqu eness of such a map using several axioms. This setting was used bv iJanowitz and Wille 


(119951 ) to classify clusterings that ar e equivariant for all monotone transformations of the 


values of the distance matrix. Later, Puzicha et al. ( 1999h investigate axioms for cost func 


tions of data-partitionings and then obtain clustering functions as optimizers of such cost 
functions. They consider as well a hierarchi cal version , mark ing the last axiomatic treat¬ 


ment of that case until today. More recently, iKleinberd (|2003l ) put forward an impossibility 


result. He gives three axioms and shows that any (non-hierarch i cal) c lustering of distance 
matrices can fulfill at most two of them. Zadeh and Ben-David ( 2009l i remedy the impos¬ 


sibility by restricting to ^-partitions, and they use minimum spanning tre es to characterize 
different clustering functions. A completely different setting is Meila ( 2005h where an arsenal 


of axioms is given for distances of clustering partitions. They characterize some distances 
(variation of information, classification error metric) using different subsets of their axioms. 

One of the reviewers brought cl ustering of discrete data to our a t tentio n. As far as 
we understand, consensus clustering (Mirkin, 19751 : Dav and McMorrii . 2003 1 and additive 
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clustering ( Shepard and Arable . 19791 : Mir kin . 1987 1 are popular in social studies clustering 
communities. What we call additive clustering in this work is something completely different 
though. Still, application of our notions to clustering of discrete structures warrants further 
research. 


1.2 Spirit of Our Ansatz 

Let us now give a brief description of our approach. To this end assume for simplicity that 
we wish to find a hierarchical clustering for certain distributions on We denote the set 
of such distributions by V. Then a clustering is simply a map c that assigns every P ^ V 
to a collection c{P) of non-empty events. Since we are interested in hierarchical clustering, 
c(P) will always be a forest, i.e. we have 

A, A' G c{P) A T A or A C A or A D A'. (1) 

Here A T A' means sufficiently distinct, i.e. AflA' = 0 or something stronger (cf. Definition[TJ 
Following the idea that eventually one needs to store and process the clustering c(P) on a 

computer, our first axiom assumes that c(P) is finite. For a distribution with a continuous 

density the level set forest, i.e. the collection of all connected components of density level sets, 
will therefore not be viewed as a clustering. For densities with finitely many modes, however, 
this level set forest consists of long chains interrupted by finitely many branchings. In this 
case, the most relevant information for clustering is certainly represented at the branchings 
and not in the intermediate chains. Based on this observation, our second clustering axiom 
postulates that c(P) does not contain chains. More precisely, if s{F) denotes the forest that 
is obtained by replacing each chain in the forest F by the maximal element of the chain, our 
structured forest axiom demands that 

s{c{P)) = c{P). (2) 

To simplify notations we further extend the clustering to the cone defined by V by setting 

c(aP) := c(P) (3) 

for all a > 0 and P G P. Equivalently we can view V as, a collection of finite non-trivial 
measures and c as a map on V such that for a > 0 and P G P we have aP G P and 
c{aP) = c(P). ft is needless to say that this extended view on clusterings does not change 
the nature of a clustering. 

Our next two axioms are based on the observation that there do not only exist distribu¬ 
tions for which the “right notion” of a clustering is debatable but there are also distributions 
for which everybody would agree about the clustering. For example, if P is the uniform dis¬ 
tribution on a Euclidean ball B, then certainly everybody would set c(P) = {B}. Clearly, 
other such examples are possible, too, and therefore we view the determination of distribu¬ 
tions with such simple clusterings as a design decision. More precisely, we assume that we 
have a collection A of closed sets, called base sets and a family Q = {Qa}a£A C P called 
base measures with the property A = suppQ^ for all A G A. Now, our base measure 
axiom stipulates 

c{Qa) = {A}. (4) 
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It is not surprising that different choices of A, Q, and ± may lead to different clusterings. In 
particular we will see that larger classes A usually result in more distributions for which we 
can construct a clustering satisfying all our clustering axioms. On the other hand, taking a 
larger class A means that more agreement needs to be sought about the distributions having 
a trivial clustering ([4]). For this reason the choice of A can be viewed as a trade-off. 



c{P) c{P') c{P) U (P') 


Figure 1: Example of disjoint additivity for two distributions having a density. 

Axiom Q only describes distributions that have a trivial clustering. However, there are 
also distributions for which everybody would agree on a non-trivial clustering. For example, 
if P is the uniform distribution on two well separated Euclidean balls Bi and i? 2 ) then the 
“natural” clustering would be c{P) = {Bi, B 2 }. Our disjoint additivity axiom generalizes 
this observation by postulating 

suppPi _L suppP 2 c{Pi + P 2 ) = c{Pi) U c(P 2 ). (5) 

In other words, if P consists of two spatially well separated sources Pi and P 2 , the clustering 
of P should reflect this spatial separation, see also Figure [T] Moreover note this axiom 
formalizes the vague term “spatially well separated” with the help of the relation X, which, 
like A and Q is a design parameter that usually influences the nature of the clustering. 

The axioms (j4]) and ([5]) only described the horizontal behaviour of clusterings, i.e. the 
depth of the clustering forest is not affected by (jT]) and ([5]). Our second additivity axiom 
addresses this. To motivate it, assume that we have a P € P and a base measure Qa, 
e.g. a uniform distribution on A, such that suppP C A. Then adding Qa to P can be 
viewed as pouring uniform noise over P. Intuitively, this uniform noise should not affect 
the internal and possibly delicate clustering of P but only its roots, see also Figure [2l Our 
base additivity axiom formalizes this intuition by stipulating 

suppPcA c{P + Qa) = s{c{P) U {A}). (6) 

Here the structure operation s( •) is applied on the right-hand side to avoid a conflict with 
the structured forest axiom ([2|). Also note that it is this very axiom that directs our theory 
towards hierarchical clustering, since it controls the vertical growth of clusterings under a 
simple operation. 


Qa I + 

H c(P + Qa) = {A} U c(P) 

Figure 2: Example of base additivity. 
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Any clustering satisfying the axioms m to ([6]) will be called an additive clustering. 
Now the first, and rather simple part of our theory shows that under some mild technical 
assumptions there is a unique additive clustering on the set of simple measures on forests 

•s(>i) E o^aQa I C ^ is a forest and ua > 0 for all A € 

A&F 

Moreover, for P € >S(A) there is a unique representation P = YIagF <^aQa and the additive 
clustering is given by c(P) = s{F). 

Unfortunately, the set S{A) of simple measures, on which the uniqueness holds, is usually 
rather small. Consequently, additive clusterings on large collections V are far from being 
uniquely determined. Intuitively, we may hope to address this issue if we additionally impose 
some sort of continuity on the clusterings, i.e. an implication of the form 

P^^P ^ c{Pn) -A C{P) . ( 7 ) 

Indeed, having an implication of the form ([7|, it is straightforward to show that the clustering 
is not only uniquely determined on 5 (A) but actually on the “closure” of 5(A). To find a 
formalization of (0, we first note that from a user perspective, c{Pn) —t c{P) usually 
describes a desired type of convergence. Following this idea, Pn ^ P then describes a 
sufficient condition for © to hold. In the remainder of this section we thus begin by 
presenting desirable properties c{Pn) -A- c{P) and resulting necessary conditions on P^ -A P. 

Let us begin by assuming that all are contained in 5(A) and let us further denote 
the corresponding forests in the unique representation of Pn by Fn- Then we already know 
that c{Pn) = s{Fn), so that the convergence on the right hand side of ([7]) becomes 

s{Fn) -A c{P). (8) 

Now, every s{Fn), as well as c(P), is a finite forest, and so a minimal requirement for ([8]) 
is that s{Fn) and c{P) are graph isomorphic, at least for all sufficiently large n. Moreover, 
we certainly also need to demand that every node in s{Fn) converges to the corresponding 
node in c{P). To describe the latter postulation more formally, we fix graph isomorphisms 
Cn : s{Fi) -A s{Fn) and ( : s{Fi) -A c{P). Then our postulate reads as 

Cn(A) C(^)) (9) 

for all A € s{Fi). Of course, there do exist various notions for describing convergence of 
sets, e.g. in terms of the symmetric difference or the Hausdorff metric, so at this stage we 
need to make a decision. To motivate our choice, we first note that ([9]) actually contains 
two statements, namely, that Cn(A) converges for n -A oo, and that its limit equals C(^)- 
Now recall from various branches of mathematics that definitions of continuous extensions 
typically separate these two statements by considering approximating sequences that auto¬ 
matically converge. Based on this observation, we decided to consider monotone sequences 
in ([9]), i.e. we assume that A C Ci(^) U C 2 (A) C ... for all A G s{Fi). Let us denote the 
resulting limit forest by F^o, i.e. 

Too:=|lJCn(7l) I Ags(Fi)|, 

^ T) 
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which is indeed a forest under some mild assumptions on A and ±. Moreover, : s{Fi) —>■ 
Foa defined by ^oo(^) := Un Cn{A) becomes a graph isomorphism, and hence ([9]) reduces to 

Coo(^) = C(^) P-almost surely for all A G s{Fi). (10) 

Summing up our considerations so far, we have seen that our demands on c{Pn) — >■ c{P) 
imply some conditions on the forests associated to the sequence {Pn)i namely Cn(^) ^ for 
all A G s{Fi). Without a formalization of Pn —>■ P, however, there is clearly no hope that 
this monotone convergence alone can guarantee ([7]). Like for ([9]), there are again various 
ways for formalizing a convergence of Pn —>■ P- To motivate our decision, we first note 
that a weak continuity axiom is certainly more desirable since this would potentially lead to 
more instances of clusterings. Furthermore, observe that ([7]) becomes weaker the stronger 
the notion of —)■ P is chosen. Now, if Pn and P had densities /„ and /, then one of 

the strongest notions of convergence would he fn yA f ■ In the absence of densities such a 
convergence can be expressed by Pn P, i.e. by 

Pn{B) yA P{B) for all measurable B. 

Combining these ideas we write {Pn, Fn) P Pn P and there are graph isomorphisms 
Cn : s{Fi) —>■ s{Fn) with CniA) yA for all A G s{Fi). Our formalization of ([7]) then becomes 

{Pn,Fn) yA P Foo = c{p) in the sense of ([lO]), (11) 

which should hold for all Pn G 5(^) and their representing forests Fn- 

While it seems tempting to stipulate such a continuity axiom it is unfortunately incon¬ 
sistent. To illustrate this inconsistency, consider, for example, the uniform distribution P 
on [0,1]. Then P can be approximated by the following two sequences 

Bii'’ := l[l/n,l-l/n]-P 

Bn^ ■= l[0,l/2-l/n]-P + 1[1/2P]P 

By (jllj) the first approximation would then lead to the clustering c(P) = {[0,1]} while the 
second approximation would give c(P) = {[0,1/2), [1/2,1]}. 

Interestingly, this example not only shows that m is inconsistent but it also gives a 

hint how to resolve the inconsistency. Indeed the first sequence seems to be “adapted” to the 

( 2 ) 

limiting distribution, whereas the second sequence {Pn ) is intuitively too complicated since 
its members have two clusters rather than the anticipated one cluster. Therefore, the idea 
to find a consistent alternative to m is to restrict the left-hand side of m to “adapted 
sequences”, so that our continuity axiom becomes 

{Pn,Fn) A' P and Pn is P-adapted for all n Foo = c{P) in the sense of (|10p . 

In simple words, our main result then states that there exists exactly one such continuous 
clustering on the closure of <S(M). The main message of this paper thus is: 

Starting with very simple building blocks Q = {QA}AeA for which we (need to) agree that 
they only have one trivial cluster {A}, we can construct a unique additive and continuous 
clustering on a rich set of distributions. Or, in other words, as soon as we have fixed {A, Q) 
and a separation relation F, there is no ambiguity left what a clustering is. 


P 


( 2 ) 
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What is left is to explore how the choice of the clustering base [A, Q, T) influences the 
resulting clustering. To this end, we first present various clustering bases, which, e.g. describe 
minimal thickness of clusters, their shape, and how far clusters need to be apart from 
each other. For distributions having a Lebesgue density we then illustrate how different 
clustering bases lead to different clusterings. Finally, we show that our approach goes 
beyond density-based clusterings by considering distributions consisting of several lower 
dimensional, overlapping parts. 

2. Additive Clustering 

In this section we introduce base sets, separation relations, and simple measures, as well as 
the corresponding axioms for clustering. Finally, we show that there exists a unique additive 
clustering on the set of simple measures. 

Throughout this work let = (fl,T) be a Hausdorff space and let B D o'{'T) be a 
(T-algebra that contains the Borel sets. Furthermore we assume that A4 = Ado is the set of 
hnite, non-zero, inner regular measures P on fl. Similarly denotes the set of non-zero 
measures on id if id is a Radon space and else of non-zero, inner regular measures on fl. In 
this respect, recall that any Polish space—i.e. a completely metrizable separable space—is 
Radon. In particular all open and closed subsets of are Polish spaces and thus Radon. 
For inner regular measures the support is well-defined and satishes the usual properties, see 
Appendix [A] for details. The set Mq forms a cone: for all P, P' € Ado and all a > 0 we 
have P -b P' G Ado and aP G Ado- 

2.1 Base Sets, Base Measures, and Separation Relations 

Intuitively, any notion of a clustering should combine aspects of concentration and contigu¬ 
ousness. What is a possible core of this? On one hand clustering should be local in the sense 
of disjoint additivity, which was presented in the introduction: If a measure P is understood 
on two parts of its support and these parts are nicely separated then the clustering should 
be just a union of the two local ones. Observe that in this case suppP is not connected! On 
the other hand—in view of base clustering—base sets need to be impossible to partition into 
nicely separated components. Therefore they ought to be nicely connected. Of course, the 
meaning of nicely connected and nicely separated are interdependent, and highly disputable. 
For this reason, our notion of clustering assumes that both meanings are specihed in front, 
e.g. by the user. Provided that both meanings satisfy certain technical criteria, we then 
show, that there exists exactly one clustering. To motivate how these technical criteria may 
look like, let us recall that for all connected sets A and all closed sets Pi,..., we have 

AcPiLJ...UPfc 3\i<k:AcBi. (12) 

The left hand side here contains the condition that the Pi,..., Pfc are pairwise disjoint, for 
which we already introduced the following notation: 

P ±0 P' PnP' = 0. 

In order to transfer the notion of connectedness to other relations it is handy to generalize 
the notation Pi U ... U P^. To this end, let T be a relation on subsets of 0. Then we denote 
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the union BiU ... L) of some Bi,..., B^ C fi by 

BiU...UBk, 

iff we have Bi _L Bj for all i ^ j. Now the key idea of the next dehnition is to generalize the 
notion of connectivity and separation by replacing ±0 in (I12p by another suitable relation. 

Definition 1 Let A Cl B be a collection of closed, non-empty sets. A symmetric relation _L 
defined on B is called a A-separation relation iff the following holds: 

(a) Reflexivity: For all B a B: B L B B = ^. 

(b) Monotonicity: For all A,A',B E B: 

Ac A' and A' ± B A ± B. 

(c) A- Connectedness: For all A ^ A and all closed Bi,... ,Bk ^ B: 

± ± 

AcBi[J...CBk 3i<k:AcBi. 

Moreover, an A-separation relation ± is called stable, iff for all Ai C A 2 C ... with An E A, 
all n >1, and all B E B: 

An -L B for all n > 1 An -L B. (13) 

n>l 

Finally, given a separation relation _L then we say that B,B' are L-separated, if B 1. B'. 
We write B ao B' iff not BAB', and say in this case that B, B' are L- connected. 

ft is not hard to check that the disjointness relation ±0 is a stable ^-separation relation, 
whenever all ^ E ^ are topologically connected. To present another example of a separation 
relation, we fix a metric d on and some r > 0. Moreover, for B,B' cLl we write 

B Lr B' d{B,B') > r. 

In addition, recall that a B C Ll is r-connected, if, for all x, x' E B, there exists xq, ... ,Xn E 
B with xq = X, Xn = x', and d{xi-i,Xi) < r for all i = 1,... ,n. Then it is easy to show 
that Tt is an stable M-separation relation if all A G A are r-connected. For more examples 
of separation relations we refer to Section 14.11 

ft can be shown that ±0 is the weakest separation relation, i.e. for every M-separation 
relation T we have ^ T ^4' A ±0 A' for all A, A' E A. We refer to Lemma [30l also 

showing that T-unions are unique, i.e., for all Ai,..., A^ and all A'.^,..., A'j^, in A we have 

AiU...UAk = A[u...uAi, {Ai,...,Ak} = {A[,...,A'f,,}. 

Finally, the stability implication (|13p is trivially satisfied for finite sequences Ai C • • • C Am 
in A, since in this case we have Ai U • • • U Am = Am. For this reason stability will only 
become important when we will consider limits in Section |3l 

We can now describe the properties a clustering base should satisfy. 
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Definition 2 A (stable) clustering base is a triple {A, Q,-L) where A C ;B\{0} is a class 
of non-empty sets, A is a (stable) A-separation relation, and Q = {Qa}agA C A4 is a 
family of probability measures on with the following properties: 


(a) Flatness: For all A,Al^A with A C A' we either have Qai{A) = 0 or 


Qa{-) 


Qa'{ - nA) 
Qa'{A) 


(b) Fittedness: For all A ^ A we have A = supp^A- 

We call a set A a base set iff A ^ A and a measure a € At a base measure on A iff 
A £ A and there is an a > 0 with a = oQa- 


Let us motivate the two conditions of clustering bases. 

Flatness concerns nesting of base sets: Let A d A' he base 
sets and consider the sum of their base measures Qa + Qa'- 
If the clustering base is not flat, weird things can happen— 
see the right. The way we defined flatness excludes such cases 
without taking densities into account. As a result we will be able to handle aggregations 
of measures of different Hausdorff-dimension in Section [4.31 Fittedness, on the other hand, 
establishes a link between the sets A £ A and their associated base measures. 

Probably, the easiest example of a clustering base has measures of the form 



Qa{-) 


nA) 

g{A) 


lAdg 
g{A) ’ 


(14) 


where g is some reference measure independent of Qa- The next proposition shows that 
under mild technical assumptions such distributions do indeed provide a clustering base. 


Proposition 3 Let g £ and L be a (stable) A-separation relation for some A C lC{g), 
where 

fC{g) := \^C £ B \ h < g{C) < oo and C = supp g{- n C) } 

denotes the set of g- support sets. We write Q^A | \ A £ A'), where Qa is defined 

by ([m. Then (A, is a (stable) clustering base. 

Interestingly, distributions of the form (1141) are not the only examples for clustering 
bases. For further details we refer to Section 14.31 where we discuss distributions supported 
by sets of different Hausdorff dimension. 

2.2 Forests, Structure, and Clustering 

As outlined in the introduction we are interested in hierarchical clusterings, i.e. in clustering 
that map a finite measure to a forest of sets. In this section we therefore recall some 
fundamental definitions and notations for such forests. 
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Definition 4 Let A be a class of closed, non-empty sets, _L be an A-separation relation, 
and C be a class with A C C C B \ {0}. We say that a non-empty F C C is a {C-valued) 
_L- forest iff 

A,A' e F => A ± A' or Ac A' or A' c A. 

We denote the set of all such finite forests by Tc and write F := ^b\{ 0 }. 

A finite _L-forest F ^ F is partially ordered by the inclusion relation. The maximal 
elements maxT := {A G F : fA' G F s.t. A F A'} are called roots and the minimal 
elements minT := {A G F : fA' G F s.t. A' F A) are called leaves. It is not hard to see 
that A ± A' , whenever A, A' G T is a pair of roots or leaves. Moreover, the ground of F is 

G(F) :=[J A, 

AeF 

that is, G{F) equals the union over the roots of F. Finally, F is a tree, iff it has only a single 
root, or equivalently, G{F) G F, and F is a chain iff it has a single leaf, or equivalently, iff 
it is totally ordered. 

In addition to these standard notions, we often need a notation for describing certain 
sub-forests. Namely, for a finite forest F G F with A G F we write 

F\^^ := {A'gFI A'^A} 

for the chain of strict ancestors of A. Analogously, we will use the notations F|^^, F|^^, and 
Flf-. for the chain of ancestors of A (including A), the tree of descendants of A (including 
A), and the finite forest of strict descendants of A, respectively. We refer to Figure [3] for an 
example of these notations. 

Definition 5 Let F be a finite forest. Then we call Ai, A 2 G F direct siblings iff Ai A ^2 
and they have the same strict ancestors, i.e. F\^^^ = In this case, any element 

A^GminF-,. =minFU. 

is called a direct parent of Ai and A 2 . On the other hand for A,A'£F we denote A' as 
a direct child of A, iff 

A' G maxF|^^. 

Moreover, the structure of F is defined by 

s{F) := IA G F I A is a root or it has a direct sibling A^ G F | 

and F is a structured forest iff F = s{F). 

For later use we note that direct siblings Ai, A 2 in a _L-forest F always satisfy Ai _L A 2 . 
Moreover, the structure of a forest is obtained by pruning all sub-chains in F, see Figure [H] 
We further note that s{s{F)) = s{F) for all forests, and if F,F' are structured _L-forests 
with G(F) ± G(F') then we have s(F U F') = F U F'. 

Let us now present our first set of axioms for (hierarchical) clustering. 
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Figure 3: Illustrations of a forest F and of its structure s{F). 


Axiom 1 (Clustering) Let (A, Q, -L) be a clustering base and V C be a set of mea¬ 
sures with Q dV. A map c: V ^ J- is called an A-clustering if it satisfies: 

(a) Structured: For all P the forest c{P) is structured, i.e. c{P) = s{c{P)). 

(b) Scaleinvariance: For all P and a > 0 we have aP G V and c{aP) = c{P). 

(c) BaseMeasureClustering: For all A G A we have c(Qa) = 

Note that the scale invariance is solely for notational convenience. Indeed, we could 
have defined clusterings for distributions only, in which case the scale invariance would have 
been obsolete. Moreover, assuming that a clustering produces structured forests essentially 
means that the clustering is only interested in the skeleton of the cluster forest. Finally, the 
axiom of base measure clustering means that we have a set of elementary measures, namely 
the base measures, for which we already agreed upon that they can only be clustered in a 
trivial way. In Section U] we will present a couple of examples of {A, Q, -L) for which such 
an agreement is possible. Finally note that these axioms guarantee that if c: P —>• is a 
clustering and a is a base measure on A then a gV and c(a) = {A}. 

2.3 Additive Clustering 

So far our axioms only determine the clusterings for base measures. Therefore, the goal of 
this subsection is to describe the behaviour of clusterings on certain combinations of mea¬ 
sures. Furthermore, we will show that the axioms describing this behaviour are consistent 
and uniquely determine a hierarchical clustering on a certain set of measures induced by Q. 

Let us begin by introducing the axioms of additivity which we have already described 
and motivated in the introduction. 

Axiom 2 (Additive Clustering) Let (A, Q, T) be a clustering base and V C be a 
set of measures with Q C V. A clustering c. V ^ T is called additive iff the following 
conditions are satisfied: 

(a) Disjoint Additivity: For all Pi,... ,Pk GV with mutually L-separated supports, i.e. 

supp Pi _L supp Pj for all i j, we have Pi + ... + € P and 

c(Pi + . . . + Pn) = c(Pi) U • • • U c{Pn) . 

(b) Base Additivity: For all P G P and all base measures a with suppP C supp a we 
have a + P gV and 

c(o -I- P) = s({supp a} U c(P)). 
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Our next goal is to show that there exist additive clusterings and that these are uniquely 
on a set S of measures that, in some sense, is spanned by Q. The following definition 
introduces this set. 

Definition 6 Let {A, Q, T) be a clustering base and F € be an A-valued finite F-forest. 
A measure Q is simple on F iff there exist base measures aA on A ^ F such that 

Q = a^. (15) 

agf 

We denote the set of all simple measures with respect to {A, Q, -L) by S := iS(^) . 

Figure 0] provides an example of a simple measure. The next lemma shows that the 
representation [TSl of simple measures is actually unique. 

Lemma 7 Let [A, Q,-L) be a clustering base and Q € 5(.A). Then there exists exactly one 
Fq G Fa such that Q is simple on Fq. Moreover, the representing base measures in (|15p 
are also unique and we have supp Q = GFq . 

Using Lemma [3 we can now define certain restrictions of 
simple measures Q G S{A) with representation (1151) . Namely, 
any subset F' <Z F gives a measure 

AgF' 

We write := Q|^| and similarly Q\a^,Q\^^,Q\(za- 

With the help of Lemma [7] it is now easy to explain how a possible additive clustering 
could look like on 5(>1). Indeed, for a Q G 5(^), Lemma [3 provides a unique finite forest 
Fq G Fa that represents Q and therefore the structure s{Fq) is a natural candidate for a 
clustering of Q. The next theorem shows that this idea indeed leads to an additive clustering 
and that every additive clustering on <S(^) retrieves the structure of the underlying forest 
of a simple measure. 

Theorem 8 Let {A, Q,F) be a clustering base and 5(^) the set of simple measures. Then 
we can define an additive A-clustering c : <S(^) ^ Fa by 

c{Q) := s{Fq) , Q G 5(^). (16) 

Moreover, every additive A-clustering c :V ^ F satisfies both <S(^) C V and (I16p . 

3. Continuous Clustering 

As described in the introduction, we typically need, besides additivity, also some notion 
of continuity for clusterings. The goal of this section is to introduce such a notion and to 
show that, similarly to Theorem El this continuity uniquely defines a clustering on a suitably 
defined extension of <S(^). 




Q 


Figure 4: Simple measure. 
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To this end, we first introduce a notion of monotone convergence for sequences of simple 
measures that does not change the graph structure of the corresponding clusterings given 
by Theorem [HI We then discuss a richness property of the clustering base, which essentially 
ensures that we can approximate the non-disjoint union of two base sets by another base set. 
In the next step we describe monotone sequences of simple measures that are in some sense 
adapted to the limiting distribution. In the final part of this section we then axiomatically 
describe continuous clusterings and show both their existence and their uniqueness. 

3.1 Isomonotone Limits 

The goal of this section is to introduce a notion of monotone convergence for simple measures 
that preserves the graph structure of the corresponding clusterings. 

Our hrst step in this direction is done in the following definition that introduces a sort 
of monotonicity for set-valued isomorphic forests. 

Definition 9 Let F,F' ^ T be two finite forests. Then F and F' are isomorphic, denoted 
by F = F' , iff there is a bijection f:F^F' such that for all A,A'^F we have: 

AcA' ^ C(^) c C{A'). (17) 

Moreover, we write F < F' iff F = F' and there is a map f : F ^ F' satisfvina\17\ and 

A C ((A), AeF. (18) 

In this case, the map f, which is uniquely determined by dm, dm and the fact that F and 
F' are finite, is called the forest relating map (FRM) between F and F'. 

Forests can be viewed as directed acyclic graphs: There is an edge between A and A' in F 
A <Z A' and no other node is in between. Then F = F' holds iff F and F' are isomorphic 
as directed graphs. From this it becomes clear that = is an equivalence relation. Moreover, 
the relation F < F' means that each node A oi F can be graph isomorphically mapped to 
a node of F' that contains A, see Figure [5] for an illustration. Note that < is a partial order 
on F and in particular it is transitive. Consequently, if we have hnite forests Fi < ■ ■ ■ < F^ 
then Fi < F^ and there is an FRM : Fi F^. This observation is used in the following 
definition, which introduces monotone sequences of forests and their limit. 

Definition 10 A isomonotone sequence of forests is defined as a sequence of finite 
forests {Fn)n C F such that s{Fn) < s(F)i+i) for all n > 1. If this is the case, we define the 
limit by 


Too := lim s(F„) : = 

n—>-oo 


U Cn(^) I 71 G s(Fi) , 

n>l ^ 


where fn- s{Fi) —)• s{Fn) is the FRM obtained from s{Fi) < s{Fn). 
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It is easy to see that in general, the limit forest of an 
isomonotone sequence of ^-valued forests is not ^-valued. 
To describe the values of F^o we define the monotone clo¬ 
sure of an ^ C by 

U An\ An G A and Ai C A2 C ... 

^ n>l 


- 

Figure 5: F < F' and the ar¬ 
rows indicate 


The next lemma states some useful properties of A and Foo- 


Lemma 11 Let T be an A-separation relation. Then T is actually an A-separation relation. 
Moreover, if T is stable and {Fn) C is an isomonotone sequence then F^o ■= lim„ s{Fn) 
is an A-valued F-forest and we have s{Fn) < F^o for all n > 1. 

Unlike forests, it is straightforward to compare two measures Qi and Q 2 on B. Indeed, 
we say that Q2 majorizes Qi, in symbols Qi < Q2, iff 

Qi{B) < Q2{B), for all BgB. 

For {Qn) C M. and P G A4, we similarly speak of monotone convergence Qn t iff 
Qi F Q2 F ■ ■ ■ F P and 


lim Qn{B) = P{B), for all B G B. 

n^oo 

Clearly, Q < Q' implies supp Q C supp Q' and it is easy to show, that Qn t B implies 

P ( supp P\[J supp Qn) =0. 

n>l 

We will use such arguments throughout this section. For example, if a, are base measures 
on A, A' with a < 0 ' then A <Z A'. With the help of these preparations we can now define 
isomonotone convergence of simple measures. 

Definition 12 Let {A,Q,F) be a elustering base and (Qn) C 5(^) be a sequence of sim¬ 
ple measures on finite forests {Fn) C Then isomonotone convergence, denoted by 
{Qn, Fn) t P, means that both 

QnfP and s{Fi) < 5 (^ 2 ) < ... . 

In addition, S := <S(^) denotes the set of all isomonotone limits, i.e. 

S{A) = [P G M \ {Qn,Fn) t P for some (Qn) C S{A) on {Fn) C Fa }• 

For a measure P G 5(^) it is probably tempting to define its clustering by c{P) : = 
lim„s(T„), where {Qn,Pn) f P is some isomonotone sequence. Unfortunately, such an 
approach does not yield a well-defined clustering as we have discussed in the introduction. 
For this reason, we need to develop some tools that help us to distinguish between “good” 
and “bad” isomonotone approximations. This is the goal of the following two subsections. 
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3.2 Kinship and Subadditivity 

In this subsection we present and discuss a technical assumption on a clustering base that 
will make it possible to obtain unique continuous clusterings. 

Let us begin by introducing a notation that will be frequently used in the following. To 
this end, we fix a clustering base {A, Q, X) and a P G At. For B ^ B we then define 

Qp{B) := {aQA \ a > 0, A € A, B C A, uQa < P} , 

i.e. Qp{B) denotes the set of all basic measures below P whose support contains B. Now, 
our first definition describes events that can be combined in a base set: 

Definition 13 Let {A, Q,-L) be a clustering base and P ^ AA.. Two non-empty B,B' ^ B 
are called kin below P, denoted as B ~p B' , iff Qp{B U B') 0, i.e., iff there is a base 

measure a € Q such that the following holds: 

(a) B U B'C suppa (b) a < P. 

Moreover, we say that every such o € Qp{B L) B') supports B and B' below P. 

Kinship of two events can be used to test whether 
they belong to the same root in the cluster forest. To 
illustrate this we consider two events B and B' with 
B Ap B'■ Moreover, assume that there is an A G A 
with B U B' C A. Then B Ap B' implies that for 
all such A there is no a > 0 with oQa < P- This 
situation is displayed on the right-hand side of Fig¬ 
ure I Now assume that we have two base measures 
a, P < P on A, A' G A that satisfy A ^p A' and 
P{A n A') >0. If A is rich in the sense of A U A' G A, then we can find a base measure b 
on B := A U A' with a < b < P or P < b < P. The next definition relaxes the requirement 
A U A' G A, see also Figure [3 for an illustration. 

Definition 14 Let P G A4q be a measure. For B,B' ^ B we write 

B ALp B^ P{B n B') = 0 and 

B oDp B' P{B n P') > 0 . 

Moreover, a clustering base (A, Q, T) is called P-subadditive iff for all base measures a, a! < 
P on A, A' G A we have 

A ODp A' 3b G Qp{A U A') : b > o or b > a'. (19) 

Note that the implication (I19jl in particular en¬ 
sures Qp{A U A') A 0, i.e. A ~p A'. Moreover, the 
relation _LLp is weaker than any separation relation 
T since we obviously have A oop A' A cd^ 

A' A QD A', where the second implication is 

shown in Lemma [301 The following definition intro¬ 
duces a stronger notion of additivity. 



Ai ~p A2, 

P Ai Ap ^3 



Figure 6: Kinship. 
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Definition 15 Let w be a relation on B. An A C B is <s>- additive iff for all A^A'^A 

A CD A' A LI A' e A. 

The next proposition compares the several notions of (sub)-additivity. In particular it 
implies that if A is aD 0 -additive then {A, Q^’^, -L) is P-subadditive for all P G At. 

Proposition 16 Let {A, , 1.) be a clustering base as in Proposition If A is aop- 

additive for some P G A4, then [A, is P-subadditive. Conversely, if (A, 

is P-subadditive for all P p, then A is cd^- additive and thus also wp-additive. 


3.3 Adapted Simple Measures 

We have already seen that isomonotone approximations by simple measures are not struc¬ 
turally unique. In this subsection we will therefore identify the most economical structure 
needed to approximate a distribution by simple measures. Such most parsimonious struc¬ 
tures will then be used to define continuous clusterings. 

Let us begin by introducing a different view on simple measures. 

Definition 17 Let {A, Q, -L) be a clustering base and Q be a simple measure on F ^ P 4 
with the unique representation Q = oaQa- We define the map Xq: F ^ Q by 


Aq(^) := 


OtA'QA'{W) ] -Qa, 


\A'&F: A'dA 


A€F. 


Moreover, we call the base measure Aq(A) G Q the level of A in Q. 


In some sense, the level of an A in Q combines all ancestor 
measures including Qa and then restricts this combination to A, see 
Figured] for an illustration of the level of a node. With the help of 
levels we can now describe structurally economical approximations 
of measures by simple measures. 


1 



A 

Aq(A) 


Figure 8 : Level. 


Definition 18 Let (A, Q, -L) be a clustering base and P G Ado a finite measure. Then a 
simple measure Q on a forest F G Fa is P -adapted iff all direct siblings Ai, A 2 in F are: 

(a) P -grounded: if they are kin below P, i.e. Qp{Ai U A 2 ) 0, then there is a parent 

around them in F. 

(b) P-fine: every b G Qp{Ai U A 2 ) can be majorized by a base measure b that supports 
all direct siblings Ai,..., A^. of Ai and A 2 , i.e. 

b G Qp{Ai U A 2 ) 3b G Qp{Ai U ... U A^) with b > b. 


(c) strictly motivated: for their levels Oi := Aq(Ai) and a 2 := Aq(A 2 ) in Q there is 
an a Li (0,1) such that every base measure b that supports them below P is not larger 
than aai or 002 , i.e. 

Vb G Q : b > aai or b > aa 2 b ^ Qp(Ai U A 2 ). (20) 
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Finally, an isomonotone sequence (Qn^Fn) t F adapted, if Qn is P-adapted for all n > 1. 

Since siblings are _L-separated, they are _LLp-separated, so strict motivation is no con¬ 
tradiction to P-subadditivity. Levels are called motivated iff they satisfy condition (I20p 
for a = 1. Figure [9] illustrates the three conditions describing adapted measures, ft can be 
shown that if A is oo-additive, then any isomonotone sequence can be made adapted. 



3 b € Qp{AVd A') with a! <h but without parent 



Grounded but not fine: b € Qp{Ai U A 2 ) 
cannot be majorized to support A 3 



Adapted: grounded, hne and motivated 


Figure 9: Illustrations for motivated, grounded, fine, and therefore adapted. 

The following self-consistency result shows that every simple measure is adapted to 
itself. This result will guarantee that the extension of the clustering from 5 to 5 is indeed 
an extension. 

Proposition 19 Let (A, Q, T) be a clustering base. Then every Q € S(A) is Q-adapted. 

3.4 Continuous Clustering 

In this subsection we finally introduce continuous clusterings with the help of adapted, 
isomonotone sequences. Furthermore, we will show the existence and uniqueness of such 
clusterings. 

Let us begin by introducing a notation that will be used to identify two clusterings as 
identical. To this end let Pi,P 2 G P” be two forests and P € A4o be finite measure. Then 
we write 

Pi =p F 2 , 

if there exists a graph isomorphism : Pi —>■ F 2 such that P{A/\f{A)) = 0 for all A G 
Pi. Now our first result shows that adapted isomonotone limits of two different sequences 
coincide in this sense. 

Theorem 20 Let {A, Q, T) be a stable clustering base and P G Mq be a finite measure 
such that A is P-subadditive. If (Qn, Fn), (Qn, Flf) t P o-re adapted isomonotone sequences 
then we have 

lims(Poo) =p lims(P^). 

n n 
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Theorem [20] shows that different adapted sequences approximating a measure P neces¬ 
sarily have isomorphic forests and that the corresponding limit nodes of the forests coincide 
up to P-null sets. This result makes the following axiom possible. 

Axiom 3 (Continuous Clustering) Let {A, Q, T) he a clustering base, V C A4 q be a 
set of measures. We say that c: V ^ P is a continuous clustering, if it is an additive 
clustering and for all P and all adapted isomonotone sequences {Qn-,Fn) f P we have 

c{P) =p lims(Foo). 

n 

The following, main result of this section shows that there exist continuous clusterings 
and that they are uniquely determined on a large subset of 5(A). 

Theorem 21 Let [A, Q, T) be a stable clustering base and set 

■= {P € 5(A) I A is P-subadditive and there is {Qn,Fn) A* P adapted^. 

Then there exists a continuous clustering c^; Moreover, is unique on V_a, 

that is, for all continuous clusterings c: V ^ P we have 

ca{P) =p c{P) , P e Pa- 

Recall from Proposition 1161 that A is P-subadditive for all P G Atn if A is aD 0 -additive. 
It can be shown that if A is aD_ 4 -additive, then any isomonotone sequence can be made 
adapted. In this case we thus have Pa = 5(A) and Theorem YT\\ shows that there exists a 
unique continuous clustering on <S(A). 


3.5 Density Based Clustering 


Let us recall from Proposition [3] that a simple way to define a set of base measures Q was 
with the help of a reference measure g. Given a stable separation relation T, we denoted 
the resulting stable clustering base by (A, Q^’"^,T). Now observe that for this clustering 
base every Q G 5(A) is |U-absolutely continuous and its unique representation yields the 
//-density / = for suitable coefficients oa > 0. Consequently, each level set 

{/ > A} consists of some elements A ^ F, and if all elements in A are connected, the 
additive clustering c(Q) of Q thus coincides with the “classical” cluster tree obtained from 
the level sets. It is therefore natural to ask, whether such a relation still holds for continuous 
clusterings on distributions P G Pa- 

Clearly, the first answer to this question needs to be negative, since in general the cluster 
tree is an infinite forest whereas our clusterings are always finite. To illustrate this, let us 
consider the Factory density on [0,1], which is defined by 


fix) 


1 — X, if X G [0, ^) 
1, if X G [i,l] 



Clearly, this gives the following T 0 -decomposition of the level sets: 


{/>A} 


[0,1], ifA<l, 

[0,1-A) LJ[i,l], ifi<A<l, 
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which leads to the clustering forest Ff = { [0,1], 1] } U { [0,1 — A) | ^ < A < 1 }. Now 

observe that even though Ff is infinite, it is as graph somehow simple: there is a root [0,1], a 
node [ 5 , 1 ], and an infinite chain [0,1 — A), ^ < A < 1. Replacing this chain by its supremum 
[ 0 , we obtain the structured forest 

{[ 0 , 1 ], [ 0 ,i),[i,l]}, 

for which we can then ask whether it coincides with the continuous clustering obtained from 
{A, _i_ 0 ) if consists of all closed intervals in [ 0 , 1 ] and /r is the Lebesgue measure. 

To answer this question we first need to formalize the operation that assigns a structured 
to an infinite forest. To this end, let F be an arbitrary T-forest. We say that C C T is a 
pure chain, iff for all C,C' £ C and A ^ F \ C the following two implications hold: 

AcC AdC, 

C dA C d A. 

Roughly speaking, the first implication ensures that no node above a bifurcation is contained 
in the chain, while the second implication ensures that no node below a bifurcation is 
contained in the chain. With this interpretation in mind it is not surprising that we can 
define the structure of the forest F with the help of the maximal pure chains by setting 

s{F) := I C I C C T’ is a maximal pure chain . 

Note that for infinite forests the structure s{F) may or may not be finite. For example, for 
the factory density it is finite as we have already seen above. 

We have seen in Lemma m that the nodes of a continuous clustering are T-separated 
elements of A. Consequently, it only makes sense to compare continuous clustering with 
the structure of a level set forest, if this forest shares this property. This is ensured in the 
following definition. 

Definition 22 Let / : Q —)■ [0,oo] be a measurable function and (Al, Q, T) be a stable clus¬ 
tering base. Then f is of {A, Q, -L)-type iff there is a dense subset A C [0,sup/) such 
that for a// A € A the level set {/ > X} is a finite union of pairwise F-separated events 
Bi{X), ... , G A. If this is the case the level set F-forest is given by 

Ff^\ := {Bi{X) I i < k{X) and X G A}. 

Note that for given / and A the forest Ff^\ is indeed well-defined since F is an A- 
separation relation by Lemma [TT] and therefore the decomposition of {/ > A} into the sets 
Bi{X), ... , Rfc(;^)(A) G .A is unique by Lemma [30l 

With the help of these preparations we can now formulate the main result of this sub¬ 
section, which compares continuous clusterings with the structure of level set T-forests: 

Theorem 23 Let p G A4q, {A, T) the stable clustering based described in Proposition 
0 and P G Ado such that A is P-subadditive. Assume that P has a ^.-density f that is of 
[A, Q, F)-type with a dense subset A such that s{Ff^\) is finite and for a// A G A and all 
i < j < k{X) we have Bi{X) T Bj{X). Then we have P G iS(^) and 

c{P) s(-^/,a) ■ 


19 


Thomann, Steinwart, and Schmid 


On the other hand, it is not difficult to show that if P € 'S{A) then P has a density of 
{A, Q, _L)-type. We do not know though whether there has to a density of {A, Q, _L)-type 
for that even the closure of siblings are separated. 

If supp /X 7 ^ n one might think that this is not true since on the complement of the 
support anything goes. To be more precise—if /x is not inner regular and hence no support 
is defined—assume there is an open set O C with ^(O) = 0. This then means that there 
is no base set A G O, because base sets are support sets. Hence anything that would happen 
on O is determined by what happens in supp P\ 

In the literature density based clustering is only considered for continuous densities since 
they may serve as a canonical version of the density. The following result investigates such 
densities. 

Proposition 24 For a compact Q C and a measure /x £ A4q we consider the stable 
clustering base (^, T 0 ). We assume that all open, connected sets are contained in A 

and that P £ A4 q is a finite measure such that A is P-subadditive. If P has a continuous 
density f that has only finitely many local maxima then P £ Vj\^ and there a 

bijection {x^,... ,x^} —?• minc(P) sueh that 

X* £ fi{x*). 

In this case c{P) { BiX \ i < k{X) and A G Aq } where Aq = { 0 = Aq < ... < Am < 
sup f} is the finite set of levels at which the splits occur. 

4. Examples 

After having given the skeleton of this theory we now give more examples of how to use it. 
This should as well motivate some of the design decisions. It will also become clear in what 
way the choice of a clustering base {A, Q, T) influences the clustering. 

4.1 Base Sets and Separation Relations 

In this subsection we present several examples of clustering bases. Our first three examples 
consider different separation relations. 

Example 1 (Separation relations) The following define stable A-separation relations: 

(a) Disjointness: If A G B is a collection of non-empty, closed, and topologically con¬ 
nected sets then 

BP^B' ^ RnR' = 0. 

(b) T-separation: Let {Ll,d) be a metric space, r > 0, and A G B be a colleetion of 
non-empty, closed, and r-connected sets then 

B Pr B' d{B,B') > r. 

(c) Linear separation: Let H be a Hilbert space with inner product ( • | • ) and Ll G H. 
Then non-empty events A,BgLI are linearly separated, A Pi B, iff A Pqi B and 

3v £ H \ {0}, aGMVoGA, { a \ v ) < a and { b \ v ) > a. 
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The latter means there is an affine hyperplane U (Z Vt such that A and B are on 
different sides. Then Ti is a A separation relation if no base set A Z A can be 
written as a finite union of pairwise T^-disjoint closed sets. It is stable if H is finite¬ 
dimensional. 


Our next goal is to present some examples of base set collections A. Since these describe 
the sets we need to agree upon that their can only be trivially clustered, smaller collections A 
are generally preferred. Let fi be the Lebesgue measure on To define possible collections 
A we will consider the following building blocks in 

Coyad •= {axis-parallel boxes with dyadic coordinates 
Cp := {closed £p-balls }, p G [l,C)o] , 

Cconv := {convex and compact //-support sets}. 

^Dyad Corresponds to the cells of a histogram whereas Cp has connections to moving-window 
density estimation. When combined with _L 0 or Tj- and base measures of the form (I14p these 
collections may already serve as clustering bases. However, C, and Sc are not very rich since 
monotone increasing sequences in C, converge to sets of the same shape, and hence the sets 
in C, have the same shape constraint as those in C,. As a result the sets of measures Sc, for 
which we can determine the unique continuous clustering are rather small. However, more 
interesting collections can be obtained by considering finite, connected unions built of such 
sets. To describe such unions in general we need the following definition. 

Definition 25 Let AL be a relation on B, ® be its negation, and C <Z B be a class of non¬ 
empty events. The AL -intersection graph on C, Q n (C), has C as nodes and there is an 
edge between A, B Z C iff A qd B. We define: 

C II (C) := { Cl U ... U Cfc I Cl,..., Cfc G C and the graph Q n ({Ci,..., C^}) is connected }. 

Obviously any separation relation can be used. But one can also consider weaker relations 
like -LLp, or e.g. A AL A' \i Ar\ A' has empty interior, or if it contains no ball of size r. Such 
examples yield smaller A and indeed in these cases S is much smaller. 

The following example provides stable clustering bases. 


Example 2 (Clustering bases) The following examples are ao^-additive: 


AjOyad 

Ap 

Ac onv 


C_Lj) (Ccyad) 
C-Tg iC-Conv) 


= { finite connected unions of boxes with dyadic coordinates }, 
= finite connected unions of closed L^-balls^, 

= finite connected unions of convex p-support sets}. 


Then Anyad, Ap, Aconv C JC{p). Furthermore the following examples are additive: 


AjOyad ^Zrif'Dyadf^ Ap C±^(Cp), 


Aconv (^Coni;)- 


This leads to the following examples of stable clustering bases: 


{Aoyad, 



{Afiyad. 


i.'^Convi 

{Aoyad, 
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- 4.2 



Figure 10: Some examples of sets in ^box, -4conv and their closure. 


The first row is the most common case, using connected sets and their natural separation 
relation. The second row is the r-connected case. The third row shows how the fine tuning 
can be handled: We consider connected base sets, but siblings need to be r-separated, hence 
e.g. saddle points cannot he approximated. 

The larger the extended class A is, the more measures we can cluster. The following 
proposition provides a sufficient condition for A being rich. 

Proposition 26 Assume all A ^ A are path-connected. Then all B ^ A are path-connected. 
Furthermore assume that A is intersection-additive and that it contains a countable neigh¬ 
bourhood base. Then A contains all open, path-connected sets. 

One can show that the first statement also holds for topological connectedness. Fur¬ 
thermore note that Coyad is a countable neighbourhood base, and therefore -Toyad) Ap, and 
-4conv fulfill the conditions of Proposition [26l 


4.2 Clustering of Densities 

Following the manual to cluster densities given in Theorem 1231 bv decomposing the density 
level sets into T-disjoint components, one first needs to understand the T-disjoint com¬ 
ponents of general events. In this subsection we investigate such decompositions and the 
resulting clusterings. We assume n to be the Lebesgue measure on some suitable Q C and 
let the base measures be the ones considered in Proposition [3l For visualization purposes 
we further restrict our considerations to the one- and two-dimensional case, only. 


4.2.1 Dimension d = 1 


In the one-dimensional case, in which D is an interval, the examples Ap = -Aconv simply con¬ 
sist of compact intervals, and their monotone closures consist of all intervals. To understand 
the resulting clusters let us first consider the twin peaks density: 


f{x) := 5 - min{ \x - i|, |x - || }. 



Clearly, this gives the following T 0 -decomposition of the level sets: 

Hf{X) = (A, 1 - A) for A < A, Hf{X) = (A, A - A) U®(A + A, A) for A < A < A 

and hence the T 0 -clustering forest is { (0,1), (g, ^), (^, |) }. Since, none of the boundary 
points can be reached, any isomonotone, adapted sequence yields this result. However, the 
clustering changes, if the separation relation is considered. We obtain 

Hf[X) = (A, 1 — A), for A < ^ Hf[X) = (A, ^ — A) U (^ -|- A, A), for ^ -I- ^ < A < ^ 
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Name 

Merlon 

Camel 

M 

Factory 

Density 

(Ap, T 0 ) 

(Ap, Tt) with r small 

(Ap,Tt-) with T large 










Table 1: Examples of clustering in dimension d = 1 using Ap and three separation relations. 


if T € (0, |) and the resulting TT-clustering is { (0,+ — + — }. Finally, 

if T > I then all level sets are r-connected and the forest is simply {(0,1)}. In Table [1] more 
examples of clustering of densities can be found. 

4.2.2 Dimension d = 2 

Our goal in this subsection is to understand the T-separated decomposition of closed events. 
We further present the resulting clusterings for some densities that are indicator functions 
and illustrate clusterings for continuous densities having a saddle point. 

Let us begin by assuming that P has a Lebesgue density of the form 1 b , where B is some 
/r-support set. Then one can show, see Lemma [50] for details, that adapted, isomonotone 
sequences (Pn) of forests B are of the form = {A”,..., A^}, where the elements of 
each forest Fn are mutually disjoint and can be ordered in such a way that Af C A? C .... 
The limit forest Foo then consists of the k pairwise T-separated sets: 

Br.= \jA^, 

n>l 


and there is a /U-null set N & B with 

P = PiU...UPfcUW (21) 

Let us now consider the base sets Ap in Example [2] By Proposition [23 we know that 
Ap contains all open, path-connected sets and therefore all open L'^-balls. Moreover, all 
closed L'?-balls B are //-support sets with ^{dB) = 0. Our initial consideration shows that 
\b can be approximated by an adapted, isomonotone sequence (Pn) of forests of the form 
Pn = {A”} with A"' G Ap. However, depending on p and q the //-null set N in (|2T]l may 
differ. 

Now that we have an understanding of Ap and adapted, isomonotone approximations 
we can investigate some more interesting cases and appreciate the influence of the choice of 
A on the outcome of the clustering in the following example. 

Example 3 (Clustering of indicators) We consider 6 examples of p-support sets B G 
M^. The first f have two parts that only intersect at one point, the second to last has two 


23 



























Thomann, Steinwart, and Schmid 



All = C( 4 ) 

AI2 = C(#) 

^00 = C(B) 

•^Conv 

Al 


{)( .: *} 

{j( V *} 




♦♦ 



{+ . +} 









•• 



{♦ . ♦} 



■ 

■ 


{* . *} 

{. > ■) 

{. . ■} 

lx*} 

•• 







Table 2: Clusterings of indicators. 


topological components, and the last one is eonneeted in a fat way. By natural approximations 
we get the elusterings of Table IM The red dots indicate points which never are aehieved 
by any approximation. Observe how the geometry encoded in A shapes the elustering. Sinee 
Aconv o-nd A 2 are invariant under rotation, they yield the same structure of clustering for 
rotated sets. The elasses Ai and Aoo on the other hand are not rotation-invariant and 
therefore the elustering depends on the orientation of B. 


After having familiarized ourselves with the clustering of indicator functions we finally 
consider a continuous density that has a saddle point. 


Example 4 On Q := [—1,1]^ consider the density f : Q. —)■ [0,2] given by f{x,y) := 
X ■ y + 1. Then we have the following T^-deeomposition of the level sets Hf{X) of f: 


Hj{X) 


{{x,y) : xy > X-1} if X£[0,1), 

< [-l,0)2u(0,l]2 ^/A = l, 

_ {(x, y) : X < 0 and xy > X — 1} U{(x, y) : x > 0 and xy > X — 1} if Xg (1, 2). 


For {Ap, Q^’"^p,T 0 ) the elustering forest is therefore given by: 


Moreover, for {A 2 , , T,-) the elustering forest looks like { 






4.3 Hausdorff Measures 

So far we have only considered clusterings of Lebesgue absolutely continuous distributions. 
In this subsection we provide some examples indicating that the developed theory goes far 
beyond this standard example. At first, lower dimensional base sets and their resulting clus¬ 
terings are investigated. Afterwards we discuss collections of base sets of different dimensions 
and provide clusterings for some measures that are not absolutely continuous to any Haus¬ 
dorff measure. For the sake of simplicity we will restrict our considerations to T 0 -clusterings, 
but generalizations along the lines of the previous subsections are straightforward. 
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4.3.1 Lower Dimensional Base Sets 


Let us begin by recalling that the s-dimensional Hausdorff-measure on B is defined by 

OO 

:= lim inf { ^^(diam(i3i))^ \ B c\^ Bi and Vf G N: diam(i?i) < e } . 


i=l 


Moreover, the Hausdorff-dimension of a i? G .S is the value s G [0, d] at which s e->■ ^{B) 
jumps from oo to 0. If B has Hausdorff-dimensi on s, then T-L^( B) can be either zero, finite, 
or infinite. Hausdorff-measures are inner regular ( Federeii . ll969l . Cor. 2.10.23) and equals 
the Lebesgue-measure up to a normalizati on fa ctor. For a refe rence on Hausdorff-dimensions 
and -measures we refer to Falconer ( 1993 1 and Federei ( 1969h . Recall that given a Borel set 
C C a map (p: C —D is bi-Lipschitz iff there are constants 0 < ci,C 2 < oo s.t. 

cid{x,y) < d{ip{x),ip{y)) < C2d{x,y). 


Lemma 27 If C is a Lebesgue-support set in and ip: C ^ H is bi-Lipschitz then C := 
p{C) has Hausdorff-dimension s and it is an -support set in D. 

Motivated by Lemma Wf\ consider the following collection of s-dimensional base sets in D: 

•= { C D I C is the closed unit p-ball in and p: C —)• D is bi-Lipschitz }. 

Using the notation of Definition [25] and Proposition |3] we further write 

:= Cx,(Cp,,) and := . 

By A := { {x} I X G D } we denote the singletons and Qq collection of Dirac measures. 
Since continuous mappings of connected sets are connected, {Ap^s, is a stable _L 0 - 

additive clustering base. Remark that we take the union after embedding into and 
therefore also crossings do happen, e.g. the cross [— 1,1] x {0}U{0} x [—1,1] G App. Another 
possibility would be to embed Ap via a set of transformations into Finally we confine 
the examples here only to integer Hausdorff-dimensions—it would be interesting though to 
consider e.g. the Cantor set or the Sierpinski triangle. The following example presents a 
resulting clustering of an 7^ ^-absolutely continuous measure on M^. 


Example 5 (Measures on curves in the plane) On D := [—1,1]^ consider the measure 
Pi := f dH^ whose density is given by 


f{x,y) 


fMerion{x) if X > 0 and y = 0, 

< fcamelit) if X = and y = 3“^*, 

jM{t) ifx = 22*-2 and y = - 2 - 2 *. 


Here the densities and clusterings for the Merlon, the Camel and the M can be seen in 
Table\^ So for (^pq, ± 0 ) with any fixed p > 1 the clustering forest of Pi is given by: 

f [0,l]x{0},[0,i]x{0},[|,l]x{0}, ) 
c(Pi ) = <^ 51 ((0,1)), 51 ((0.2,0.5)), gi ((0.5,0.8)), I 
I 52([0,1]),52([0,0.5)),52((0.5,1]) J 

where gi'. [0,1] —D are given by gi{t) = (— 3 “^*) and g 2 {t) = , —2-2*). 
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4.3.2 Heterogeneous Hausdorff-Dimensions 

In this subsection we consider measures that can be decomposed into measures that are 
absolutely continuous with respect to Hausdorff measures of different dimensions. To this 
end, we write for two measures and on B, iff for all B ^ B with B C supp n H 

supp we have 


/x(H) < oo fJ-'{B) = 0. 

For Q, Q! C A4o we further write Q ^ Q' if /i ^ /u' for all /x € Q and ^ Q'. Clearly, the 
relation is transitive. Moreover, we have W < li} whenever s < t. The next proposition 
shows that clustering bases whose base measures dominate each other in the sense of -< can 
be merged. 

Proposition 28 Let T),..., (M™', Q™, T) be stable clustering bases sharing the 

same separation relation _L and assume ^ Q™. We define 

A:=\jA^ and Q:=\Jq\ 

i i 

Then [A, Q, T) is a stable clustering base. 

Proposition 1281 shows that the _L 0 -additive, stable bases -L 0 ) on can be 

merged. Unfortunately, however, its union is no longer T 0 -additive, and therefore we need to 
investigate P-subadditivity in order to describe distributions for which our theory provides 
a clustering. This is done in the next proposition. 


Proposition 29 Let {A^, Q}, A) and (A"^, Q^,T) be clustering bases with Qfi -< and Pi 
and P 2 be finite measures with Pi ~< A'^ and A^ -< P 2 - Furthermore, assume that A'' is 
Pi-subadditive for both i = 1,2 and let P := Pi P 2 . Then we have 

(a) For i = 1,2 and all base measures a G Qfi we have a < Pi, 

(b) If for all base measures a G Qp^ and supp Pi od supp a there exists a base measure 

a G Qpj 0 < a then A^ U A'^ is P-subadditive. 


To illustrate condition (b) consider clustering bases [Ap^s^Qfi'^, A%) and {Apg, , Am) 
for some s < t. The condition specifies that any such base measure 0 intersecting supp Pi 
can be majorized by one which supports supp Pi. Then all parts of supp Pi intersecting at 
least one component of suppP 2 have to be on the same niveau line of P 2 . Note that this is 
trivially satisfied if the su pp Pi fl supp Pi = 0. Recall tha t mixtures of the latter form have 
already been clustered in R,inaldo and Wasserman ( 201(11 ) by a kernel smoothing approach. 
Clearly, our axiomatic approach makes it possible to define clusterings for significantly more 
involved distributions as the following two examples demonstrate. 


Example 6 (Mixture of atoms and full measure) Consider H = M. Let (Mo, Qo)T 0 ) 
be the singletons with Dirac measures and consider for any fixed p > 1 the clustering base 
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Qp,s)-L 0 )- Both are cDf^-additive and stable and we have Qo A Qp,i- Now consider the 
measures 

Pq := 5q + 25i + 82 and Pi{dx) := sir?(^)'Hi{dx). 

Then the assumptions of Proposition\29\ are satisfied and the clustering of P := Pq + Pi is 
given by 

c{P) = c(Po) U c{Pi) = { {0}, (0, i), (i, 1), {1}, {2} } . 

Our last example combines Examples U] and [S] 


Example 7 (Mixtures in dimension 2) Consider Tt := [—1,1]^ and the densities fi and 
f 2 introduced in Examples [3| and respectively. Furthermore, consider the measures 

P2-.= f2d'H\ Pi:=fidn^ 


and the clustering bases Q^’^,_L 0 ) and {Api^ 2 , OF ’^)-L 0 ) for some fixed p,p' > 1 . As 

above Qp^ ^ Qf . And by Proposition [2^ the clustering forest of P = Pi + P 2 is given by 


Cl (Pi) U C2(P2) = 


[0,l]x{0},[0,i]x{0},[|,l]x{0}, 
51 ((0,1)), 51 ((0.2,0.5)), 51 ((0.5,0.8)), 
52 ([0,1]), 52 ([0,0.5)), 52 ((0.5,1]), 
[- 1 , 1 ] 2 , [- 1 , 0 ) 2 , ( 0 , 1]2 



where gi: [0,1] —>■ are given by gi{t) = (—3 2 *) and 52 ( 1 ) = —2 2*). Observe 

that 5 i and 52 lie on niveau lines of f 2 . 


5. Proofs 

5.1 Proofs for Section [2] 

We begin with some simple properties of separation relations. 

Lemma 30 Let _L be an A-separation relation. Then the following statements are true: 

(a) For all B,B' ^ B with B P B' we have P 0 P' = 0. 

(b) Suppose that _L is stable and (Aj)j>i C A is increasing. For A := [J^ An and all B ^ B 
we then have 

An -L P for all n > 1 <;=> APB 

(c) Let A ^ A and Pi,..., P^ G P be closed. Then: 

A C Pi U ... UPfc 3\i < k: A C Bi 

(d) For all Ai,..., Aj. G A and all A'^,..., A'^, G A, we have 

AiU...UAfc = A;u...uA'fc, ^ {Ai,...,Afc} = {A;,...,A'fc,}. 
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Proof of Lemma I30t (a). Let us write Bq := B n B'. Monotonicity and B B B' implies 
Bq ± B' and thus B' _L Bq by symmetry. Another application of the monotonicity gives 
Bq ± Bq and the reflexivity thus shows B B' = Bq = 0. 

(b) . “=>” is stability and follows from monotonicity. 

(c) . Existence of such an i is ^-connectedness. Now assume that there is an j ^ i with 
A d Bj. Then ^ ^ A d Bid Bj contradicting Bi _L Bj by (a). 

(d) . We write F := {Ai,..., Af^} and F' := ..., A^,}. By (c) we find an injection 

F. F — >■ F' such that A C I (A) and hence k < k' . Analogously, we find an injection 
J: F' ^ F such that A C -/(A), and we get k = k'. Consequently, I and J are bijections. 
Let us now fix an Aj € F. For Aj := JoI(Ai) € T we then find Aj C L(Aj) C J{I{Ai)) = Aj. 
This implies i = j, since otherwise Aj C Aj would contradict Aj T Aj by (a). Therefore we 
find Aj = /(Aj) and the bijectivity of I thus yields the assertion. ■ 

Proof of Proposition [3} We first need to check that the support is defined for all restric¬ 
tions := /i(- n C) to sets C ^ B that satisfy 0 < /r(C') < oo. To this end, we check that 
/i |(7 is inner regular: If f/ is a Radon space then there is nothing to prove since /U|c is a finite 
measure. If is not a Radon space, then the definition of Ad“ guarantees that /i is inner 
regular and hence is inner regular by Lemma [5TJ 

Let us now verify that (A, T) is a (stable) clustering base. To this end, we first 
observe that each Qa € jg ^ probability measure by construction and since we have 

already seen that is inner regular for all C G 'w® conclude that _ 4 /( 

Moreover, fittedness follows from A C /C(/r). For flatness let A,A' G A with A C A' and 
Qa'{A) a 0. Then for all B we have 

k{BdA) kjBdAdA') k{BflA\A') QA'jBdA) 

’ kiA) k{A\A')-k{A') k{A\A') Qa'{A) ' 

Proof of LemmaS Let Q = YIagF ^aQa and Q = J2a'gF' ^’a'Qa' be two representations 
of Q G Q. By part (d) of Lemma [5T] we then obtain 

supp Q = supp ^ oaQa = IJ supp Qa = IJ a , = GF 

Kagf / AeF AeF 

and since we analogously find suppQ = GF', we conclude that GF = GF'. The latter 
together with Lemma [30] gives maxP = maxT’'. To show that a a = for all roots 
A G max F = maxT’', we pick a root A G max F and assume that a a < ol^. Now, if A has 
no direct child, we set B := A. Otherwise we define B := A \ (Ai U ... U A^), where the Afc 
are the direct children of A in T. Because of the definition of a direct child and part (d) of 
Lemma [30] we find Ai U ... U A^. ^ A in the second case. In both cases we conclude that B 
is non-empty and relatively open in A = suppQyi and by Lemma [5T] we obtain Qa{B) > 0. 
Consequently, our assumption aA < yields aAQA^B) < a'^QA{B) < Q{B). However, 
our construction also gives 


QiB) = ^ aA»QA»{B) = aAQA{B)+ ^ aA»QA»{B)+ ^ aA»QA"{B) = aAQA{B) , 
A"^F A"Fa A"dA 
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i.e. we have found a contradiction. Summing up, we already know that max F = maxF' 
and a A = for all A G max F. This yields 

oiaQa= ^ oi'a’Qa' ■ 

TemaxF A'GmaxF' 

Eliminating the roots gives the forests Fi := F \ maxF and F[ := F' \ maxT' and 

Qi • ^ ^ chaQa Q ^ ^ oiaQa Q ^ ^ oij^iQA' ^ ^ oia'Qa' j 

AsFi AGmaxF A'GmaxF' A'gF{ 

i.e. Qi has two representations based upon the reduced forests Fi and F[. Applying the 
argument above recursively thus yields F = F' and for all A G T. ■ 

Proof of Theorem [8} We first show that (1161) defines an additive clustering. Since Axiom[T] 
is obviously satisfied, it suffices to check the two additivity axioms for V := <S(A). We begin 
by establishing Disjoint Additivity. To this end, we pick Qi,..., Qk G iS(A) with representing 
T-forests Fi such that suppQj = GFi are mutually T-separated. For A G maxTj and 
A! G maxTj with i ^ j, we then have A F A', and therefore 

F := Fi U ... U Ffc 

is the representing T-forest of Q := Qi + ... + Qk- This gives Q G iS(A) and 

c{Q) = s{F) = s(Fi) U • • • U s(Ffc) = c(Qi) U • • • U c{Qk)- 

To check BaseAdditivity we fix a Q G iS (A) with representing T-forest F and a base measure 
a = uQa with supp Q C supp o. For all A' £ F we then have A' C GF = supp Q C A and 
therefore F' := {A}UF is the representing T-forest of a + Q. This yields a + Q G <S(A) and 

c(a + Q) = s{F') = s({A} U F) = s( supp o U c{Q)). 

Let us now show that every additive A-clustering c : V ^ F satisfies both S (A) C V and 
m- To this end we pick a Q G 5(A) with representing forest F and show by induction over 
|F| = n that both Q £V and c{Q) = s(F). Clearly, for n = 1 this immediately follows from 
Axiom [TJ For the induction step we assume that for some n > 2 we have already proved 
Q' £ V and c{Q') = s{F') for all Q' £ 5(A) with representing forest F' of size |F'| < n. 

Let us first consider the case in which F is a tree. Let A be its root and aA be corre¬ 
sponding coefficient in the representation of Q. Then Q' := Q — uaQa is a simple measure 
with representing forest F' := F\A and since |F'| = n—1 we know Q' £V and c{Q') = s{F') 
by the induction assumption. By the axiom of BaseAdditivity we conclude that 

c{Q) = ciaAQA + Q') = s({A} U c(Q')) = s({^} U F') = s{F ), 

where the last equality follows from the assumption that F is a tree with root A. 

Now consider the case where F is a forest with k > 2 roots Ai,..., A^. For i < k we 
define Qi := Q\^^ . Then all Qi are simple measures with representing forests Fj := F|^^ 
and we have Q = Qi + ■ ■ ■ + Qk- Therefore, the induction assumption guarantees Qi £ V 
and c{Qi) = s(Fj). Since suppQi = Aj and Aj T Aj whenever i A T the axiom of 
Disjoint Additivity then shows Q £ V and 

c{Q) = c{Qi) U • • • U c{Qk) = s(Fi) U • • • U s(Ffc) = s{F). ■ 
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5.2 Proofs for Section [3] 


Proof of Lemma lilt For the first assertion it suffices to check ^^-connectedness. To this 
end, we hx an ^ ^ and closed sets Bi,..., with A C U ... U B^. Let (An) C A with 

An A' A. For all re > 1 part (c) of Lemma [30l then gives exactly one i(n) with An C B^^n)- 
This uniqueness together with An C An+i yields i(l) = i(2) = ... and hence An C .Bi(i) for 
all re. We conclude that A C by part (b) of Lemma [30l 

For the second assertion we pick an isomonotone sequence (P„) C J-ji, and define Too := 
lim„s(Tn)- Let us first show that Too is a T-forest. To this end, we pick A, A' € Too- By 
the construction of Too there then exist Ai,A'^ € 'S(Ti) such that for An := Cn(^i) and 
A'n := Qn{A'i) we have An A and A'n A' Now, if Ai T A'^ then An T A'n and thus 
Am T A'^ for all ire, re by isomonotonicity. Using the stability of T twice we first obtain 
A T A'n for all re and then A T A'. If Ai / A'^, we may assume Ai C A'^ since s(Ti) is 
a T-forest. Isomonotonicity implies An C A'n C A' for all re and hence A <Z A'. Finally, 
s{Fn) < Too is trivial. ■ 


Proof of Proposition [16} We first show that A is T-subadditive if A is oop-additive. To 
this end we fix A,A'^A with A wp A'. Since A is oop-additive we hnd B ■.= AVJ A' ^ A. 
This yields 


Qb{A) 


^i{A<r\B) 

KB) 


KB) 


> 0 


and analogously we obtain Qb(A') > 0. For aQA, (FQa' < T we can therefore assume that 
^ Qb{A) ^ Qb{A') ■ Setting b := I3Qb we now obtain by the flatness assumption 


»Qa{-) = a ■ = K- n ^) < b(.). 

Now assume that {A, T) is T-subadditive for all T <C //. Let A^A!^A with A A'. 
Then we have P := Qa + Q'a ^ Qa, Qa' T T. Since A is T-subadditive there is a 

base measure b < T with AVJ A' C. supp b C supp T = A U ^4' by Lemma EH Consequently 
we obtain ^4 U = supp b ^ A. ■ 


Lemma 31 Let T G AI and [A, Q, T) be a P-subadditive clustering base. Then the kinship 
relation is a symmetric and transitive relation on {B € B \ P{B) > 0} and an equivalence 
relation on the set {A G A \ 3a > 0 such that aQA < T}. Finally, for all finite sequences 
Ai,..., Ak ^ A of sets that are pairwise kin below P there is b G Qp{Ai U ... U A^). 

Proof of Lemma 1311 Symmetry is clear. Let Bi ~p B 2 and B 2 ~p T 3 be events with 
P{Bi) > 0. Then there are base measures c = jQc £ Qp(TiUT 2 ) and c' = KQc S Qp(T 2 U 
T 3 ) supporting them. This yields B 2 C C Ci C and thus P{C n C) > P{B 2 ) > 0. In other 
words, we have C wp C', and by subadditivity we conclude that there is a b G Qp{C U C). 
This gives Ti U T 3 C C U C' C supp b, and therefore Ti ~p T 3 at b. To show reflexivity 
on the specified subset of A, we hx an A G A and an a > 0 such that a ;= aQA < T. Then 
we have 0 G aQp{A) and hence we obtain A ~p A. 

The last statement follows by induction over k, where the initial step A: = 2 is simply 
the definition of kinship. Let us therefore assume the statement is true for some k >2. Let 
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Ai,..., Afc+i G A be pairwise kin. By assumption there is a b G Qp{Ai U ... D A^). Since 
this latter yields Ai C supp b we find Ai ~p supp b and by transitivity of ~p we hence have 
Afc_|_i ~p suppb. By definition there is thus a b G Qp{Ak+i U suppb) and since this gives 
Ai U ... U Afc+i C Ak+i U supp b C supp b we find b G Qp{Ai U ... U A^+i). ■ 

Lemma 32 Let (A, Q,-L) he a clustering base and Q G >S(A) with representing forest F G 
JA- Then for all A ^ F we have 

Q{ - n A) = Aq(A) + Q\ca ■ 

Proof of Lemma 1321 Let Aq G maxP be the root with A C Aq. Then we can decompose 
F into F = {A' G F ; A' D A} LJ{A' G F ; A' F ij{^' G F : A' T A}. Moreover, flatness 
of Q gives Qa'{' n A) = Q^/(A) ■ Qa{-) for all A' G A with A C A' while fittedness gives 
QA'iA) = 0 for all A' G A with A' T Aq by the monotonicity of T, part (a) of Lemma [301 
and part (b) of Lemma [5TJ For F G F we thus have 

Q{B n A) = ^ ^ aA'QA'{,B n A) + ^ ^ cxA'QA'iB n A) + ^ ^ OiA'QA'{B n A) 

TdT A"gA A'AAq 

= ^ oia'Qa'{A)Qa{B)+ ^ aA'QA'{B n A) 
a'da 

= AQ(A)(F) + Q|g^(F), 

where the last step uses Qa'{B H A) = Qa'{B) for A' C A, which follows from fittedness. ■ 


Lemma 33 Let (A, Q, T) he a clustering base and a, b be base measures on A,B G A with 
A C B. Then for all Cq £ B with a(C'o H A) > 0 w;e have 


b(- n A) 


b(Co n A) 
a{Co n A) 


• a(- n A). 


Proof of Lemma 1331 By assumption there are a, (d > 0 with a = oQa and b = fdQp- 
Moreover, flatness guarantees Qp(- fl A) = Qb{A) ■ Qa{')- For all C G F we thus obtain 

b(C n A) = /3Qp(C n A) = /3 Qb(A) • Qa{C) = PQb{A) ■ Qa{C n A) = n A). 

a 

where in the second to last step we used (5 a(‘) = Qa{'AA), which follows from A = supp Qa- 
For Co G F with a(Co n A) > 0 we thus find and inserting this in the 

previous formula gives the assertion. ■ 


Lemma 34 Let (A, Q, T) he a clustering base and Q G F(A) be a simple measure, a be a 
base measures on some A G A, and C £ B. Then the following statements are true: 

(a) If a < Q then there is a level b in Q with a < b. 

(b) If b(' n C) < a(' n C) for all levels h of Q then Q{C) < a{C). 
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(c) For all P ^ M we have Q < P if and only if b < P for all levels b in Q. 

Proof of Lemma I34t In the following we denote the representing forest of Q by F. 

(a). By a < Q we find A C suppQ = GF. Since the roots maxP form a hnite _L-disjoint 
union of closed sets of GF, the ^-connectedness shows that A is already contained in one of 
the roots, say Aq G maxF. For F' := {A' ^ F \ A C. A'} we thus have Aq G F'. Moreover, 
F' is a chain, since if there were _L-disjoint A', A" G F' then A would only be contained 
in one of them by Lemma [30l Therefore there is a unique leaf B := minP' G F and thus 
A C B. We denote the level of S in Q by b. Then it suffices to show a < b. To this end, 
let {Cl,..., Cfc} = maxCl^-^ be the direct children of B in F. By construction we know 

A (f Ci for all i = 1,... ,k and hence ^-connectedness yields A Ci U... U C^. Therefore 
Co := M \ Ci is non-empty and relatively open in A = supp Qa- This gives o(Co fl M) > 0 
by Lemma [5TJ Let us write b := Xq{B) for the level of B in Q. Lemma [32] applied to the 
node B ^ F then gives 

Q(Co) = b(Co) + Qlcs(C’) = KCo) + aA'QA'iCo) = b(Co) 

A'eF:A%B 

since for A' £ F with A' ^ B we have A' C |Jj Cj and thus suppQyi' n Cq = n Cq = 0. 
Therefore, we find o(Co n M) = a(Co) < Q{Cq) = b(Co) = b(Co H B). By Lemma 1331 we 
conclude that b(- fl A) > o(- H M). For B' £ B the decomposition B' = {B' \ A) (j{B' n A) 
and the fact that A = supp o C supp b then yields the assertion. 

(h). For A £ F we dehne 

Ba:=A\ IJ M', 

A'eF-. A'FA 

i.e. Ba is obtained by removing the strict descendants from A. From this description it is 
easy to see that {Ba : M G C} is a partition of G{F) = suppQ. Hence we obtain 

Q(C) =^Q{CnBA)=^Yl ^A'Qa'{C n Ba) 
agf agfa'gf 

= ^ ^ aA'QA'iC n Ba) + X] X] °^a'Qa'{C n Ba) 

agfa'da agfa'fa 

= AQ(H)(CnH^), ( 22 ) 

agf 

where we used Qa'{C H Ba) = Qa'{C H Ba H A) together with flatness applied to pairs 
A d A' as well as A' n Ba = 0 applied to pairs A! ^ A. Our assumption now yields 

Q(C) < ^ o(C n Ba) = o(C n supp Q) < o(C). 
agf 

(c). Let b := Xq{B) be a level of H in Q with b ^ P. Then there is a B' £ B with 
b(P') > P(P) and for B" := B' 0 supp b = B'n B we find Q{B") > a{B") = o(P') > 
P{B') > P{B"). Conversely, assume b < P for all levels b in Q. By the decomposition {22]) 
we then obtain 

Q{C) = ^ AQ(H)(CnPA) < J^P(CnPA) = P(CnsuppQ) <P(C). ■ 

agf agf 


32 


Towards an Axiomatic Approach to Hierarchical Clustering of Measures 


Corollary 35 Let [A, Q,-L) he a clustering base, Q G ‘5(^) a simple measure with repre¬ 
senting forest F and Ai,A 2 G F. Then for all o G Qq{Ai U A 2 ) there exists a level b in Q 
such that Ai U A 2 C -B and a < b. 

Proof of Corollary 1351 Let us fix an 0 G Qq{Ai U A 2 ). Since a < Q, Lemma [Ml gives a 
level b in Q with 0 < b. Setting B := supp b G F then gives Ai U A 2 C suppa C B. ■ 

Proof of Proposition [19} Let Q be a simple measure and Q = ccaQa be its unique 

representation. Moreover, let Ai, A 2 be direct siblings in F and Oi, 02 be the corresponding 
levels in Q. Then Q-groundedness follows directly from Corollary (351 To show that Ai, A 2 
are Q- motivated and Q-hne, we fix an a G Qq{Ai U A 2 ). Furthermore, let b be the level 
in Q found by Corollary 1351 i.e. we have Ai U A 2 C suppb =: B and a < b < Q. Now let 
A 3 ,..., Afc G F be the remaining direct siblings of Ai and A 2 . Since B is an ancestor of Ai 
and A 2 it is also an ancestor of A 3 ,..., A^ and hence Ai U • • • U A^ C B. This immediately 
gives b G Qq(Ai U • • • U A^) and we already know b > a. In other words, Ai, A 2 are Q-fine. 
Finally, observe that for B d A' flatness gives Qa'{B)Qb{') = Qa'{' C B). Since Ai C B 
we hence obtain 

a(Ai) < b(Ai) = ^ oia'Qa'{B)Qb{Ai) = ^ aA'QA'{Ai) 

A'db A'db 

and since Q^^(Ai) = 1 we also find 

ai(Ai) = ^ aA'QA'{Ai)QAi{Ai) = ^ aA'QA'{A.i) = ^ aA'QA'{Ai) + 0^1 ■ 

Since oca^ > 0 we conclude that a(Ai) < (1 — ei)ai(Ai) for a suitable £i > 0. Analogously, 
we find an £2 > 0 with a(A 2 ) < (1 — e 2 )<i 2 (A 2 ) and taking a := 1 — min{ei,e 2 } thus yields 
Q-motivation. ■ 

5.2.1 Proof of Theorem [20] 

Lemma 36 Let (A, Q, T) be a clustering base, P G Mq, and Q,Q' < P he simple measures 
on finite forests F and F'. If all roots in both F and F' are P-grounded, then any root in 
one tree can only be kin below P to at most one root in the other tree. 

Proof of Lemma 1361 Let us assume the converse, i.e. we have an A G maxF and 
B,B' G maxF' such that A ~p B and A ~p B'. Let a, b,b' be the respective sum¬ 
mands in the simple measures Q and Q'. Then 0 < a(A) < Q{A) < P{A) and analogously 
P{B), P{B') > 0. Then by transitivity of ~p established in Lemma ED we have B B' 
and by groundedness there has to be a parent for both in F', so they would not be roots. ■ 

Proposition 37 Let (A, Q, T) be a stable clustering base and F G Ad such that A is P- 
subadditive. Let {Qn,Fn) T B, where all forests F„ have k roots Ajj,...,A^, which, in 
addition, are assumed to be P-grounded. Then A* := |J^ A\ are unique under all such 
approximations up to a P-null set. 
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Proof of Proposition I37t The are pairwise _L-disjoint by Lemma [11] and by 

Lemma [53] they partition suppP up to a P-null set, i.e. P(suppP \ IJi = 0- Therefore 
any B ^ B with P{B) > 0 intersects at least one of the Ai. Moreover, we have 0 < 
Qi(^i) < P{^\) ^ P{^^)i i-G- P(^*) > 0. Now let ^ P be another approximation 

of the assumed type with roots BJ^ and limit roots B^ ,..., . Clearly, our preliminary 

considerations also hold for these limit roots. Now consider the binary relation i ~ J, which 
is dehned to hold iff ^4* cDp BB 

Since P{A'') > 0 there has to be a B^ with P{A'^ n B^) > 0, so for all i < A: there is a 
j < k' with i ~ j. Then, since A\^ n B^ '\ A^ P\ B^, there is an n > 1 with P{A\^ n bI_) > 0. 
By P-subadditivity of A we conclude that and Bn are kin below P, and Lemma [36] 
shows that this can only happen for at most one j < k'. Consequently, we have k <k' and 
~ dehnes an injection i j{i). The same argument also holds in the other direction and 
we see that k = k' and that i ^ j defines a bijection. Clearly, we may assume that i ^ j 
iS i = j. Then P{A^ n P-^) > 0 if and only if i = j, and since both sets of roots partition 
suppP up to a P-null set, we conclude that P(j4*AP*) =0. ■ 


Lemma 38 Let (A, Q, T) be a clustering base and P G A4 sueh that A is P -subadditive. 
Moreover, let ai,... ,ak < P be base measures on Ai,..., Aj. A such that Ai oop Ai for 
all2 < i < k. Then there is b G Qp{Ai U ... U Ak) and an a* such that b > a*, and if k > 3 
and the a 2 ,... ,ak satisfy the motivation implication (1201) pairwise, then b > ai. 

Proof of Lemma 1381 The proof of the hrst assertion is based on induction. For k = 2 the 
assertion is P-subadditivity. Now assume that the statement is true for k. Then there is a 
b € Qp{Ai U ... U and an Iq < k with b > ajp. The assumed Ai wp Ak+i thus yields 

P{Ak+i n supp b) > P(^fc+i n ^i) > 0 , 

and hence P-subadditivity gives a 6 G Qp{Ak+i Usuppb) with b > a^+i or b > b > Ojp. For 
the second assertion observe that b G Qp{Ai fl Aj) for all i,j and hence (|20|) implies b ^ a* 
for i>2. m 


Lemma 39 Let {A,Q,-L) be a clustering base and Q < P be a simple and P-adapted 
measure with representing forest F. Let C^,..., ^ F be direct siblings for some k > 2. 

Then there exists an £ > 0 such that: 

(a) For all a G Qp{C^ U ... U C^) and i < k we have a(C'*) < (1 — e) • Q{C^). 

(b) Assume that A is P-subadditive and that a < P is a simple measure with supp a oop C* 

for at least two i < k. Then for all i < k we have a(C'*) < (1 — e) • Q{C^). 

(c) If A is P-subadditive and Q' < P is a simple measure with representing forest F' such 

that there is an i < k with the property that for all B G P' we have 

Bwp& 3j yli: Baop CT 

Then Q'(- Cl C*) < (1 — e) Q{- H C*) holds true. 
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Proof of Lemma I39t Let ci,..., be the levels of ..., in Q. Since Q is adapted, 
([20l) holds for some a G (0,1). We define e := 1 — a. 

(a) . We fix an a G Qp[C^ U ... U C*'), an z < /c, and a j < k with j ^ i. Let Cj, Cj be the 
levels of C* and in Q. Since aCi and acj are motivated, we have a ^ aCi and a ^ aCj. 
Hence, there is a Cq G H with a(C'o) < aCj(Co) and thus also a(Co n C®) < aCj(Co n C®). 
Lemma [331 then yields a(- Cl C®) < aCj(- fl C®) and the definition of levels gives 

a(C®) < aCi(C®) = ag(C®) = (1 - e)Q{C^). 

(b) . We may assume suppo wp and suppa cop C^. By the second part of Lemma [38] 

applied to supp a, there is an a! G Qp(supp o U (7^ U C^) C Qp(C^ U C^) with a' > a, 

and since Q is P-fine, we may actually assume that a' G Qp{C^ U ... U (7^). Now part (a) 
yields o'((7®) < (1 — e) • Q(C^) for all z = 1,... , fe. 

(c) . We may assume z = 1. Our first goal is to show 

b(-n(7^) < (l-e)ci(-n(7^) (23) 

for all levels b in Q', To this end, we fix a level b in Q' and write B := supp b. If P{Br\C^) = 
0 , then (1^ follows from 

b((7^) = b{B n (7^) < P{B n (7^) = 0. 

In the other case we have B wp and our assumption gives a j ^ 1 with B oop (7L By 
the second part of Lemma [38]we find an a G Qp{B U (7^ U C^) C Qp{C^ U C^) with o > b, 
and by (a) we thus obtain a((7^) < (1 — e)Q{C^) = (1 — e)ci((7^). Now, Lemma 1331 gives 
o(- n C^) < (1 — e)ci(- n C^) and hence (1231) follows. 

With the help of (1^ we now conclude by part (b) of Lemma [34l that Q'{- (7 C^) < 
(1 — e)ci(- n C^) and using Ci(- (7 C^) < Q{- 7 (7^) we thus obtain the assertion. ■ 

Lemma 40 Let (A, Q, -L) be a clustering base and P G A4 sueh that A is P-subadditive. 
Moreover, let Q,Q' < P be simple P-adapted measures on F,F', and S G s{F) and S' G 
s{F') be two nodes that have children in s{F) and s{F'), respectively. Let 

{(7^,..., (7^} = maxs(P)|(^g and {P^,..., } = maxs(P')|,-_g, 

be their direet ehildren and consider the relation i ^ j (7® oop DP Then we have k,k' >2 
and if ^ is left-total, i.e. for every i < k there is a j < k' with i ~ j, then it is right-unique, 
i.e. for every i < k there is at most one j < k' with i ^ j. 

Proof of Lemma 1401 The definition of the structure of a forest gives k, k' > 2. Moreover, 
we note that P{A) > Q{A) > 0 for all A G P and P(A) > Q'{A) > 0 for all A G F'. 
Now assume that ~ is not right-unique, say 1 ~ j and 1 ~ j' for some j A j'■ Applying 
P-subadditivity twice we then find a b G Qp{C^ U U ) with b > Ci or b > t)j or 
b > hj', where Ci, t)j, and t)j/ are the corresponding levels. Since are motivated we 

conclude that b > Ci. Now, because of Qp{C^ U U D^') C Qp{D^ U D^') and P-fineness 
of Q' there is a b® G Qp(Pi U ... U D^i) with b® > b. Now pick a direct sibling of (7^, say 
(7^. Then there is a j" with 2 ~ j", and since B' := supp b® D Pi U ... U P^/ this implies 
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P{B' n > P{D^" n C®) > 0. By P-subadditivity we hence find a b" G Qp{B' U C 
Qp{C^ U C^) with b" > b' or b" > C 2 - Clearly, b" > C 2 violates the fact that are 

motivated, and thus b” > b'. However, we have shown b® > b > Ci, and thus b” > Ci. Since 
this again violates the fact that (7^,(7^ are motivated, we have found a contradiction. ■ 

Proof of Theorem 1201 We prove the theorem by induction over the generations in the 
forests. For a finite forest F, we define sq{F) := maxP and 

SAr+i(P) := S]\f{F) U { H € s{F) | H is a direct child of a leaf in S]\f{F) }. 

We will now show by induction over N that there is a graph-isomorphism si\f{Foo) —>■ 
sn{F^) with P{AA(^]\f{A)) = 0 for all A G SAr(Poo)- For N = 0 this has already been shown 
in Proposition [371 Let us therefore assume that the statement is true for some > 0. Let 
us fix an 5 G minsAr(Fc>o) and let S' := Cn{S) G minsjv(P4)) Le the corresponding node. 
We have to show that both have the same number of direct children in SAr+i(-) and that 
these children are equal up to P-null sets. By induction this then finishes the proof. 

Since S G SAr(Poo) C s{Foq), the node S has either no children or at least 2. Now, if 
both S and S' have no direct children then we are finished. Hence we can assume that S 
has direct children (7^,..., for some k > 2, i.e. 

Let Sn, Cji,... ,Cn G s{Fn) and S'^ G s{Fn) be the nodes that correspond to S, C ^,..., C^, 
and S', respectively. Since P{SAS') = 0 we then obtain for alH < fc 

p{s'n&) = Pisnc') = P{C') > Qi((7*) > Qi{ci) > o, 

that is 5' cjDp (7® for all i < k. Since S' = [J^ S!^ and (7® = |J^ this can only happen if 
S'^ (30P (7^ for all sufficiently large n. We therefore may assume without loss of generality 
that 

S'l Qop Cl^ for alH < A; and all n > 1. (24) 

Let us now investigate the structure of P)) . To this end, we will seek a kind of anchor 

G F!A , which will turn out later to be the direct parent of the yet to find Cai-1-1(17®) G 
F'^. We define this anchor by 


:= min{P G P)( | P oop C\ for alH = 1,..., k}. 


This minimum is unique. Indeed, let B'^ be any other minimum with B!^ cjop (7f for all 
i < k. Since both are minima, none is contained in the other and because F^ is a forest this 
means B!^ ± B'^. Let b'^ and b'^ be their levels in Q'^. Since Q'^ is P-adapted, these two 
levels are motivated. This means that there can be no base measure majorizing one of them 
and supporting B'^ U B'^. On the other hand, by the second part of Lemma [38l there exists a 
b" G Qp(P(jUC^U-•-UCf) with b" > b'^. Now because of P(P(jnsupp b") > P(P^n(7|) > 0 
and P-subadditivity there exists a base measure majorizing b'^ > b'^ or b" and supporting 
nsuppb((. This contradicts the motivatedness of b'^ and b(j and hence the minimum B'^ 
is unique. 
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Since B'^ is the unique minimum among all B ^ with B aop C\ for all i, we also have 
B'^ C B for all such B and hence B'^ C S'^ by (|2^ . The major difficulty in handling B'^ 
though is that it may jump around as a function of n: Indeed we may have €z F^\ s{F^) 
and therefore the monotonicity s{F^) < s{F^_^i) says nothing about B'^. In particular, we 
have in general B'^ ^'n+i- 

Let us now enumerate the set min of direct children of B'^ by , • • •, F )^, 

where kn >0- Again these can jump around as a function of n. The number kn specifies 
different cases: we have B'^ € minF,^, i.e. B'^ is a leaf, iff kn = 0; on the other hand 

€ s{F^) iS kn > 2. Next we show that for all i < fc and all sufficiently large n there is 
an index j(i, n) E {1,..., kn} with 

C\ QDp . (25) 

Note that this in particular implies kn > 1 for sufficiently large re. To this end we fix 

an i < A:. Suppose that Cj _LLp {F>n^ U ••• U Dn}F) for infinitely many rei,re 2 ,_ By 

construction B'^^ is the smallest element of F^^ that _LLp-intersects C}. More precisely, for 
any A E with A oop CJ we have A D B^^ and therefore A aop for all such A and 
all i' < k. Hence, all Q'n^ in this subsequence fulfill the conditions of the last statement in 
Lemma [Ml and we get an e > 0 such that for all such rim 

Q'nJCl) < (1 - e)Qi(Ci) < (1 - e)F(C'i) (26) 

which contradicts Q'^^{Cl) f F(C'j) since F(Cj) > 0. 

Therefore for alH < fc and all sufficiently large re there is an index j(i, re) such that (|25p 
holds. Clearly, we may thus assume that there is such an j{i, re) for all re > 1. Since j{i, re) E 
{1,..., A:„} we conclude that > 1 for all re > 1. Moreover, A:„ = 1 is impossible, since 
kn = 1 yields j{i,n) = 1, and this would mean, that C} wp D\ for all z < fc contradicting 
that B'^ is the minimal set in F^ having this property. Consequently B'j^ has the direct 
children F^,..., F^" where A:„ > 2 for all re > 1. 

So far we have seen that F^,...,F^" E s{F^) are inside S'n- Therefore S'n is not a 
leaf, and hence S' ^ minF^^^ as well. But still for infinitely many re these Dn might not 
be the direct children of S'n- Let us therefore denote the direct children of S'n E s(F))) by 
F^,..., En E s{Fn), where we pick a numbering such that F^ C F^_(_^ and by the definition 
of the structure of a forest we have k' > 2. 

For an arbitrary but fixed re we now show {D \,..., F^"} = {F^,..., F^ }. To this let 
us assume the converse. Since the En are the direct children of S'n in the structure s(F))) 
there is a jn < k' with Dn C F^" for all j, and since B'n is the direct parent of the Dh we 
conclude that B'n C E^n ■ Therefore we have C\a>p for all i < k. Since Qi and Q'n are 
adapted we can use Lemma HOl to see that for all z < A: we have Cf _LLp En for all j ^ jn- 
Let us fix a j 7 ^ jn- Our goal is to show 

Qm{Ei)<{l-e)Q'n{Ei), 

for all sufficiently large m > n, since this inequality contradicts the assumed convergence of 
Qm{Ei) to P{Ei) > Q'n{Ei) > 0. By part (c) of Lemma [39] with Q'n as Q and Qm as Q' it 
suffices to show that for all A E Em and all sufficiently large m > n we have 

A 00p E^^ A QDp E^ . (27) 
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To this end, we fix an ^ € Fm with A aop En- Then we first observe that for all m > n 
we have P{A n S'^) > P{A D S'^) > P{A n eI_) > 0. Moreover, the induction assumption 
ensures P{SAS') = 0 and since Sm S and ^ S', we conclude that P{A n Sm) > 0 
for all sufficiently large m. Now, U • • • U are direct siblings and hence we either have 

U • • • U C A or A C for exactly one io <k. In the first case we get 

P{A n Eir) > Pici nEt)> P{cl n Et) > o 

by the already established C\ rop E^ for all i < k. The second case is impossible, since it 
contradicts adaptedness. Indeed, A C implies cap En and by the already established 
Cl cap E^ for all i < k, we also know cap E^. By the second part of Lemma 1551 we 
therefore find a c G Qp{Cl^ U eIu E^A) with c > c^, where is the level of in Qm- 
Now fix any i < k with i ^ io and observe that we have P{Cln H suppc) > P{Cln Ll E^) > 
P{Cl CiEt') > 0 , and hence P-subadditivity yields a c" G Qp(C'^ Usuppc) with c" > or 
c" > c > c^, where is the level of in Qm- Since c" G Qp(C'^Usuppc) C Qp(C'^UC')))), 
we have thus found a contradiction to the fact that the direct siblings and C))) are P- 
motivated. 

So far we have shown {D ^,..., = {E^,..., E!^ } and kn = k' for all n. Without 

loss of generality we may thus assume that Dh = En for all n and all j < k'. In particular, 
this means that the direct children of S'n in equal the direct children of B'^ in P)). Let 

us write 

D^:=\jDi, j = l,...,k' 

n>l 

and z ~ j iff C| cDp P-J. We have seen around (|25l) that for all i < k there is at least one 
j < ki = k' with i ~ j, namely j{i, 1). By Lemma HD] we then conclude that j{i, 1) is the 
only index j < k' satisfying i ^ j. By reversing the roles of Cl and D^, which is possible 
since = E^ Is a direct children of S'n in s(P4)) we can further see that for all j there is 
an index i with i ^ j and again by Lemma HD] we conclude that there is at most one i with 
i ~ j. Consequently, i ^ j defines a bijection between {C\, ..., Cf} and {D\, ..., D\ } and 
hence we have k = k'. Moreover, we may assume without loss of generality that i ~ j iff 
i = j- From the latter we obtain Cl cap P-J IS i = j. 

To generalize the latter, we fix n,m > 1 and write i ~ j iff aop Dm- Since we have 
P{C'n n Din) > P{Ci n D\) > 0, we conclude that i i, and by Lemma 1401 we again see 
that i ^ j is false for i ^ j- This yields Cn cap Dm IS i = j and by taking the limits, we 
find C cap D^ iff i = j. 

Next we show that P(C'*AP*) = 0 for all i < k. Clearly, it suffices to consider the case 
i = 1. To this end assume that R := C^\D^ satisfies P(P) > 0. For Rn := PnC^ = Cn\D^, 
we then have P„ f since (7^ t and R C C^- Consequently, 0 < P(P) = P(P fl C^) 
implies P{Rn) > 0 for all sufficiently large n. On the other hand, we have P(P fl D^) = 0 
by the definition of R and P{R f] D^) < P{C^ fl D^) = 0 for all j 7 ^ 1 as we have shown 
above. 

We next show that Q'm{Rn) = Q'm\-,pi {Rn)- To this end it suffices to show that for 
any A G E^ with A ^ P))^| we have Q'm{A D Rn) < P{A D Rn) = 0. Let us thus fix an 
A G Fin with A ^ Fin K r/ • Then we either have A F or ^ T - In the first case there 
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is j < k with A C Dm which means, as shown above, that P{Ar]Rn) < P{DmfiRn) = 0. In 
the second case, by definition of structure, we even have A _L S!^. So there is a A'^ € s{F^) 
with A C A'j^ and A'^ _L S'm and by isomonotonicity of the structure there is A' € F'^ 
with A'^ C A' and A' _L S'. Hence by induction assumption P(A n < P{A f] Sn) < 

p{A ns)< P{A' ns) = P{A' n s') = o. 

Using P{C'' n -D*) > 0 we now observe that fulfills the conditions of part (c) of 

Lemma [39] for and and by R^ C we thus obtain 

Qmi^n) = Qm\^B' < (1 “ ^)QniRn) < (1 “ s)P{Rn)- 

This contradicts 0 < P{Rn) = liiUm^oo Q'mi-Rn)■ So we can assume P{Rn) = 0 for all n 
and therefore P{R) = linin^oo P{Rn) = 0- By reversing roles we thus find P(D^ AC^) = 
P{C^ \ D^) + P{D^ \ C^) = 0 and therefore the children are indeed the same up to P-null 
sets. 

Finally, we are able to finish the induction: To this end we extend to the map 
Cat+i: S]si+i{Foo) sm+i{F'oc) by setting, for every leaf S G mins7v(^oo), 

C7v+i(C*) ■.= D' 

where ..., € sn+i{Foo) are the direct children of S and ..., G S 7 v+i(Tj^) are 

the nodes we have found during our above construction. Clearly, our construction shows 
that Cai+i is a graph isomorphism satisfying P(AA^Af+i(A)) = 0 for all A G SAr+i(.I^oo)- B 

5.2.2 Proof of Theorem YT \\ 

Lemma 41 Let (A, Q, T) he a elustering base, Pi,...,Pk G Ai with suppPj T suppPj 
for all i ^ j, and Qi < Pi be simple measures with representing forests Pi. We define 
P := Pi + ... + Pk, Q := Qi + ... + Q^, and F := U • • • U Ff-. Then we have: 

(a) The measure Q is simple and F is its representing A-forest. 

(b) For all base measures a < P there exists exactly one i with a < Pj. 

(c) If A is Pi-subadditive for all i < k, then A is P-subadditive. 

(d) if Qi is Pi-adapted for all i <k, then Q is adapted to P. 

Proof of Lemma 1411 (a). Since Qi < Pi < P we have GFi = suppQj C suppPj. By the 

monotonicity of T we then obtain GFi T GFj for i A j- From this we obtain the assertion. 

(b) . Let a < P be a base measure on A G A. Then we have A = suppo C suppP = 
Uj suppPj. By A-connectedness there thus exists a i with A C suppPj. For B £ B we then 
find a(P) = a{B D suppPj) < P{B n suppPj) = Pi{B D suppPj) = Pi { B ). Moreover, for 
J A * we have a(A) > 0 and Pj (A) = 0 and thus i is unique. 

(c) . Let a, A < P be base measures on base sets A, A' with A <s>p A'. Since A T A' implies 
A ±0 A', we have A qd A'. By (b) we find unique indices i,i' with a < Pj and a' < Pi/. 
This implies A C suppPj and A' C suppPj, and hence we have suppPj oo suppPj/ by 
monotonicity. This gives i = i', i.e. a, A < Pi. Since A is Pj-subadditive there now is an 
a G Qpi{A U A') with a > a or a > o', and since o < Pj < P we obtain the assertion. 
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(d). From (b) we conclude Qp{Ai U A 2 ) = 0 for all roots Ai € Fi and A 2 € Fj and all 
i ^ j. This can be used to infer the groundedness and fineness of Q from the groundedness 
and fineness of the Qi. Now let a, < P be the levels of some direct siblings A,A'€^F in 
Q and b G Qp{A\J A') be any base measure. By (b) there is a unique i with b < Pi, and 
hence a, < Pi as well. Therefore Q inherits strict motivation from Qi. ■ 


Lemma 42 Let {A,Q,-L) be a elustering base, P G M, a be a base measure on A ^ A 
with suppP C A, and Q < P be a simple measure with representing forest F. We define 
P' := a + P, Q' \= a + Q, and F' := {A} U F. Then the following statements hold: 

(a) The measure Q' is simple and F' is its representing F-forest. 

(b) Let a' < P' be a base measure on A'. Then either a! < a or there is an a G ( 0 , 1 ) such 
that a'(- n A') = a(- n A') + aa'{- fl A'). 

(c) If A is P-subadditive then A is P' - subadditive. 

(d) If Q is P-adapted, then Q' is P'-adapted. 

Proof of Lemma 1421 (a). We have GF = suppQ C suppP C A and hence F' is a 
T-forest, which is obviously representing Q. 

(b) . Let us assume that a' ^ a, i.e. there is a Cq G with a'(Co) > ci(C'o) and thus we find 

o'(ConA') = a'(C'o) > 0(6*0) > a{CQr\A'). In addition, we have A' = suppo' C suppo = A, 
and therefore Lemma 1551 shows o(- fl A') = 7o'(- fl A'), where 7 ;= < 1 - Setting 

a := 1 — 7 yields the assertion. 

(c) . Let 01,02 < P' be base measures on sets ^1,^2 ^ A with Ai aopi A2. Since suppP' = 
A, we have AiU A2 C A, and thus o G QpfiAi U ^2)- Clearly, if o > Oi or 0 > 02, there is 
nothing left to prove, and hence we assume Oi ^ o and 02 ^ 0. Then (b) gives a* G ( 0 , 1 ) 
with Oj(-n^j) = a{-nAi)-\-aiai{-nAi). We conclude that o(-n^j)+ajOj(-nAj) = afipAi) < 
P'{- n Ai) = o(- n Ai) + P(- n Ai), and thus 0*0, = aiOj(- n Afi < P(- fl Afi < P. Since A 
is P-subadditive, we thus find an o G Qp{A\ U A2) with say 0 > oiOi. For A := suppo we 
then have 

o' := o(- n Ji) + o(- n Jl) > o(- n A) + aioi(- n Jl) > o(- n ^i) + aioi(- n ^i) = oi, 

where we used supp Oi = C A. Moreover A = supp 0 C supp P C A, together with 
flatness of Q shows that a! is a base measure, and we also have 0' < o + o< o + P = P'. 
Finally we observe that AiU A2 A = supp o', and hence o' G QpfiAi U A2). 

(d) . Clearly, P' is grounded because it is a tree. Now let Ai,..., A^ G P', A: > 2 be direct 
siblings and o( be their levels in Q'. Since A is the only root it has no siblings, so for all i 
we have Ai G P. Moreover, the levels o* of Ai in Q are P-motivated and P-fine since Q is 
P-adapted. Now let b G QpfiAi D A2) and B := suppb. 

To check that Q' is P'-fine, we first observe that in the case b < 0 there is nothing 
to prove since o G QpfiAi U ... U by construction. In the remaining case b ^ 0 we 
find a /3 > 0 with b(- n P) = o(- n P) + fib{- fl P) by (b), and by P-fineness of Q, there 
exists a b G Qp{Ai U ... U AQ with b > fib. Since supp b C suppP C suppo we see that 
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a + b is a simple measure, and hence we can consider the level b' of supp b in a + b. Since 
b'<a+b<a + P< P', we then obtain b' G Qpi{Ai U ... U Ak) and for C € B we also have 

b{C) = b{C r^B) = a{C n S) + /3b(C r^B)< a{C n S) + b{C D B) = b'{C D B) < b'{C). 

To check that Q' is strictly P'-nrotivated we fix the constant a G (0,1) appearing in the 
strict P-motivation of Q. Then there are dj G (0,1) such that o(- H Aj) + aOj = dja'. We 
set d := max{di,d 2 } G (0,1) and obtain a(- n Ai) + aOi < da' for both i = 1,2. Let us 
first consider the case b < a. Since our construction yields a' = a(- n Ai) + a* ^ a, there is a 
Co G P with a'(Co) > a(Co). This implies da'(Co) > a(Co n Ai) + Qai(Co) > a(Co n Ai) > 
b(Co n Aj), i.e. b ^ da'. Consequently, it remains to consider the case b ^ a. By (b) and 
supp b C suppP' = A there is a /3 G (0,1] with b(- n P) = a(- n P) + /3b(- D P). Then 

/3b = /3b(- n P) = b(- n P) - a(- n P) < P'(- n P) - a(- n P) = P(- n P) < P , 

and since /3b G Qp{Ai U A 2 ) we obtain /3b ^ aa* for i = 1,2. Hence there is an event 
Co C supp b with /3b(Co) < aaj(Co), which yields b(ConAj) = a(ConAinP)+/3b(ConAi) < 
a(Co n A) + aaj(Co n Aj) < da'(Co n Aj), i.e. b ^ da'. ■ 

Proof of Theorem [21} For a P G iS(A) and a P-adapted isomonotone sequence {Qn-, Pn) 

P we define c_a{P) :=p hm„_).oo s(Pn), which is possible by Theorem 1201 By Proposition 
m we then now that Cj\^{Q) = c{Q) for all Q G Q, and hence satisfies the Axiom of 
BaseMeasureClustering. Furthermore, c _4 is obviously structured and scale-invariant, and 
continuity follows from Theorem 1201 

To check that is disjoint-additive, we fix Pi,..., Pfc G P^i with pairwise T-disjoint 
supports and let Pi be Pj-adapted isomonotone sequences of simple measures. 

We set Qn '■= Qh P ■ ■ ■ + Qn P := Pi P^. By Lemma jlT] Qn is simple on 

Fn := P^ U • • • U Fn and P-adapted, and A is P-subadditive. Moreover, we have Qn P 
and s{Fn) = Ujinherits monotonicity as well. Therefore {Qn,Pn) ^ P is P-adapted 
and lim s(P,i) = IJ-lims(P^) implies disjoint-additive. 

To check BaseAdditivity we fix a P G P^t and a base measure a with suppP C a. 
Moreover, let {Qn,Pn) P be a P-adapted sequence. Let Qn := a -|- Qn and P' := 
a -|- P. Then by Lemma |32| is simple on P/^ := {A} U Fn and P'-adapted, and A is 
P'-subadditive. Furthermore we have [Q'n,Fn) P' and therefore we find P' G Vji, and 
lims(P/^) = s({A} Ulims(P„)). 

For the uniqueness we finally observe that Theorem [8| together with the Axioms of Ad¬ 
ditivity shows equality on S{A) and the Axiom of Continuity in combination with Theorem 
[20| extends this equality to P^. ■ 

5.2.3 Proof of Theorem [23] 

Lemma 43 Let jj, G and consider {A, 

(a) If A, A' G A with A C A' fi-a.s. then A C A'. 

(b) Let P G A4fi such that A is P-subadditive and P has a ^-density f that is of (A, Q, T)- 
type with a dense subset A such that s{Ff^\) is finite. For all X ^ A and all Ai,..., A^. G 
A with Ai U... U Afc C {/ > A} pL-a.s. there is B ^ A with Ai U... U A^ C B pointwise 
and B C {f > X} pL-a.s. 
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Proof of Lemma I43t (a). Let A, A' ^ A with A (Z A' fj,-a.s. and let x G A. Now 
B := A \ A' is relative open in A and if it is non-empty then n{B) > 0 since A is a support 
set. Since by assumption fi{B) = 0 we have B = $. 

(h). Since H ■.= {f > X\ G A there is an increasing sequence Cn H of base sets. Let 
dbn := XlB„dn G Qp. For all i < k eventually B^ Ai, so there is a n s.t. B^ is 
connected to all of them. By P-subadditivity between bn and Al^i dfjL ,..., Al^^. dfj, there is 
dc = X'lcdfj, G Qp that supports all of them and majorizes at least one of them. Hence 
A < A' and thus Hi U ... U C C C {/ > A'} C {/ > A} fi-a.s. By (a) we are finished. ■ 


Lemma 44 Let f be a density of {A, Q, -L)-type, set P := f dfj, and assume A is P- 
subadditive and P/,a is a ehain. For all k > 0 and all n G N let Bn = Ci U ... U Cfc 
be a (possibly empty) union of base sets Ci,... ,Ck G A with Bn C { / > A } for all A E A. 
Then P := f dpi G V and there is {Qn, Fn) P adapted where for all n Fn is a ehain and 
Bn C minP„. 


Proof of Lemma 1441 Let (A„)„ C A be a dense countable subset with Xn < p and set 
An '■= {Ai,...,A„}, Aqo := |J„A„. Remark that max An < p for all n, |A„| = n and 
Ai C A 2 C .... For very n we enumerate the n elements of An by A(l, n) < ... < A(n, n). 
For every A E A^o we let n\ := min{n | A E A„} E N. 

Since / is of (H, Q, A)-type, H{X) := {/ > A} E H for A E A. Therefore there is 
Ax^n S A s.t. Ax^n T PWj where n > 0. We would like to use these Ax^n to construct Qn, 
but they need to be made compatible in order that {Qn, Fn)n becomes isomonotone. Hence 
we construct by induction a family of sets H(A,n) E H, A E A„, re E N with the following 
properties: 


Ax^n U H(A(i -|-1, re), re) U H(A, re — 1) U C H(A, re) C H{X) U N{X, re), p{N{X, re)) = 0. 

Here A{X{i + 1, re), re) is thought as empty if i = re and similarly H(A, re — 1) = 0 if re = 1 or 
A A„_i. All of these involved sets C are base sets with C C P(A) and hence by Lemma |43] 
there is such an H(re, A). Since Ax^n /^n H{X) we then also have H(A, re^ + re) j' H{X). 

Now for all re consider the chain Fn := {H(A, re) | A E A„} C A and the simple measure 
Qn on Fn given by: 

n 

^ ^ ^5 ^)) * lA(A(i,n),n) ^ ^ ^ A(A^,n) • 0) 

^—1 AgAtt, 


Let X E B. Let 


An(ic) := {A E A„ I X E H(A,re)} 

Then hn{x) = maxA„(x). And if x E A{X,n) then x E H(A, re-|-l) so A„(x) C A„_|_i(x) and 
we have: 


hn{x) = maxA„(x) < maxA„+i(x) = hn+i{x) 

Furthermore if A E A„(x) then x E A{X,n) C P(A) implying h{x) > X. Therefore hi < 
h2 < ■ ■ ■ < h. 
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On the other hand for all e > 0, since is dense, there is a n and A € A„ with 
h{x) — e < X < h{x)). Then x G H{X) and therefore for n big enough x G ^(A, n) and then: 

h{x) > hn{x) > A > h{x) — e. 

This means hn{x) h{x) for all x G -B so we have ^ h pointwise and by monotone 
convergence F„) t Bq. ■ 

Proof of Theorem 1231 Let / be a density as supposed and set F := s{Ff^\). By assump¬ 
tion F is finite. If |B| = 1 then Fj \ is a chain and the Theorem follows from Lemma H4l 
using Bn = 0, n G N, in the notation of the lemma. Hence we can now assume |B| > 1. We 
prove by induction over |B| that /d/r G 5(^1) and c{f d^) s{Ff^\) and assume that this 

is true for all /' with level forests |s(F//^a'| < |B|. For readability we first handle the case 
that F is not a tree. 

Assume that F has two or more roots Ai,... ,Ak with k = k{0). Denote by fi := /|^ 
the corresponding densities, hence / = /i + • • • + /fc, and set Fi := s(Ty. a) = and 

Pi ■= fi d/x. We cannot use DisjointAdditivity, because separation of the Ai does not 
imply separation of the supports. Hence we have to construct a B-adapted isomonotone 
sequence {Qn, Fn) B. Since B = Bi U ... U B^ we have |Bj| < |B| and hence by induction 
assumption for all i < we have c(Bj) = B,, and there is an isomonotone Bj-adapted 
sequence {Qi^n, Bj,„) B*. For Qn := Qi,n + ■ • • + Qk,n and B„ ;= Bp„ U ... U Fk,n it is 

clear that (Qm B„) ^ B is isomonotone. Let b G Qp and B := supp b. We show that this is 
< 3 D^-connected to exactly one Aj. There is /3 > 0 s.t. db = (dlpdfi and (dip < / /U-a.s. Now 
let A G A with X < ^ and A < inf | A' G A | k{X') 7 ^ A:(0) }. Because for all A G A also the 
closures of clusters are T-separated we have 

B C B^lA) = B^U ... U B^. 

By connectedness there is a unique i < k with B C Bi(X) and by monotonicity B T Bj{X) 
for all i ^ j. Since this holds for all A G A small enough and A is dense, this means that B is 
c 3 D^-connected to exactly i. Using this, B-adaptedness of Qn is inherited from Bj-adaptedness 
of Qi,n- Therefore B = lim„ Qn G B and c(B) = B. 

Now assume that B is a tree. Since \F\ > 1 there are direct children Ai,...,Afc of 
the root in the structured forest B with k > 2. Let p := inf{A G A | k{X) 1}. Since 
B is a tree, p > 0. Let /o(w) := min{p,/(w)} and f'{pS) := max{0,/(w) — p} for all 
a; G H, and set dBg := /o dp and dP' := f’ dp. Then B = Bq -|- B' is split into a podest 
corresponding to the root and its chain and the density corresponding to the children. We 
set A':=|A — p|AgA,A>p}. Then |Bj/ a'| = |B| — 1 and by induction assumption 
there is {Qn,Ff) t B' adapted. Set B„ := GFf and B := |JB„. Then by Lemma HU there 
is {Qn,Fn) /P Pq adapted, which is given by a density hn- 

Now there might be a gap Sn '■= p — sup/i„ > 0. By construction —>■ 0 but to be 

precise we let 

Qn •— Qn B ^ ^ ' lyt dp. 

AGmaxi^.^ 

This is still a simple measure on Fn and therefore {Qn + Qn,Fn U Ff) B. We have to 
show B-adapted: 
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Grounded: Is fulfilled, since we consider trees at the moment. 

Fine: Let Ci,... ,Ck G FnU be direct siblings. Then Ci,... ,Ck G F^ because is 
a chain. If they are contained in one of the roots of F^ fineness is inherited from 
adaptedness of Q'^. Else they are the roots of F^^. Let a = alAd^ G Qp be a 
basic measure that _LLp-intersects say Ci and C 2 . Then is clear that a < p and by 
P-subadditivity fineness is granted. 

Motivated Let C,C' G FnU Fl^ be direct siblings. Then again C,C' G If they are 
contained in one of the roots of F^ motivatedness is inherited from adaptedness of Q'^. 
Else they are the roots of F^^. Let a = alAdp G Qp be a base measure that supports 
Cl U C 2 . Again it is clear that a < p and hence it cannot majorize neither the level of 
C nor the one of C^ ■ 


Proof of Proposition [24} Since / is continuous, all Hf{\) are open and it is the disjoint 
union of its open connected components. We show any connected component contains at 
least one of the xi,... ,Xfc. To this end let Aq > 0 and Bq be a connected component of 
F[f{\o) (then Bq 7 ^ 0). Because is compact, so is the closure Bq, and hence the maximum 
of / on Bq is attained at some yo G Bq. Since there is yi G Bq we have f{yo) > f{yi) > A 
we have yo G iLj(A). Now P/(A) is an open set, so yo is an inner point of this open set, and 
we know yQ G Bq, therefore yQ G Bq. Therefore yo G Po is a local maximum. 

Hence for all A there are at most k components and / is of (^, T 0 )-type. The 
generalized structure s{Ff) is finite, since there are only k leaves. 

Now, fix for the moment a local maximum x,. Since Xi is a local maximum, there is eo 
s.t. /(y) < f{xi) for all y with d{y,Xi) < Eq. Eor all e G (0,eo) consider the sphere 


SsW ■■= {y gQ: f{y) > A and d{y,Xi) = e}. 


Since is compact and Se(A) is closed, it is also compact. So as A f fi^i) the Ss{X) is a 
monotone decreasing sequence of compact sets. Assume that all S'e(A) were non-empty: Let 
yn G 5£p/(„+i)(A) then {yn)n is a sequence in the compact set <S'eo/ 2 (A), hence there would 
be a subsequence converging to some y^. This subsequence eventually is in every 
and hence y^ G nA</(A) this would be non-empty. This means that /(y^) > f{xi). 

On the other hand, since e < eg we have /(ye) > f{xi). Therefore all y^ are local maxima, 
yielding a contradiction to the assumption that there are only finitely many. Hence for all 
e, *S'e(A) = 0 for all A G (Ae, f{xi). From this follows, that all local maxima have from some 
point on their own leaf in Ff. Therefore there is a bijection ?/): {xi,... ,Xfc} —>• minc(P) s.t. 
Xi G -ipixi). 

Lastly, we need to show that also the closures of the connected components are separated, 
to verify the conditions of Theorem [23l We are allowed to exclude a finite set of levels, in 
this case the levels Ai,..., Am at which A 1 —>■ k{X) G N changes. Consider 0 < Aq < Ai s.t. 
for all A G (Aq, Ai) k{X) stays constant. Set A := G (Aq, A). Now let A, A' be connected 
components of Hf{X) and let B,B' be the connected components of P/(A) with A d B and 
A' C B'. First we show A <Z B: let yg G A. Then there is (y„) C A with yn —>■ 2/0- Because 
/ is continuous we have 

A < f{yn) fivo) > A > A 


and hence yg G P. Similarly we have A' (Z B' and P ±0 B' implies A ±0 AA 
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5.3 Proofs for Section |4] 

Lemma 45 Let A, A' he closed, non-empty, and (path-)connected. Then: 

A\J A' is (path-)connected <;=> A oog 

Therefore any finite or countable union Ai U ... U k < oo of such sets is connected iff 
the graph induced by the interseetion relation is eonnected. 

Proof of Lemma I45t Topological connectivity means that A U A' cannot be written as 
disjoint union of closed non-empty sets. Hence, if AU A' is connected, then this union cannot 
be disjoint. On the other hand if x € A 0 A' 7 ^ 0 and A U A' = B U B' with non-empty 
closed sets then x & B 01 x (z B'. Say x B, then still B' has to intersect A or A', say 
n A 7 ^ 0. Then both B, B' intersect A and both C := H D A and C' := B' H A are closed 
and non-empty. But since A = C U C" is connected there \s y ^ C r\ C d B B' and 
therefore BiJ B' \s not a disjoint union. 

For path-connectivity: If x G An A' 7 ^ 0 then for all ?/ G AU A' there is a path connecting 
X to y, so A U A' is path-connected. On the other hand, if A U A' is path connected then 
for any x G A and x' G A' there is a continuous path /: [0, 1] ^ A U A' connecting x to x^ 
Then B := f~^{A) and B' := f~^{A') are closed and non-empty, and B\J B' = [0, 1]. Since 
[0,1] is topologically connected there is y & B f) B' and so f{y) G A O Ah ■ 

Proof of Example [1} Reflexivity and monotonicity are trivial for all the three relations. 
Disjointness: Stability is trivial and connectedness follows from Lemma 05] and from the 
observation: 

AcBiU...UBk => A = (AnRi) tj... Lj’(AnRfc) 

T-separation: Connectedness follows from the definition of r-connectedness. For stability 
let An tn A and A„ Tr B for n G N and observe 

d{A, B) = sup d{x, B) = sup sup d{x, B) = sup d{An, B) > r. 

x^A n£N x£An n€N 


Linear Separation: Connectedness follows from the condition on A since A C Hi U ... U Rfc 

implies A = A H Hi U . .. U A D To prove stability let A„ hn ^ a-^d A„ B for n G N. 
Observe that 

V 1 -^ sup{q; G M I ( X I a ) < aVa G A} 

is continuous and the same holds for the upper bound for the a. Hence for each n and 
any vector x G Lf with ( x | x ) = 1 there is a compact, possibly empty interval In{v) of a 
fulfilling the separation along x. Since by assumption the unit sphere is compact so is the 
semi-direct product := {(x,a) | a G Iniv)}. Since Ln A ^ In d> In+i is a monotone 
limit of non-empty compact sets, the limit P|^ In is non-empty. ■ 


Lemma 46 Let pL G . If C C /C(/i) then Cjj_(C) C /C(//). 
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Proof of Lemma I46t Let A = Ci U ... U € C jj_ (C) then: 

supp Ia d/j. = supp(lci H-h Icfc) d/x = Cl U ... U Cfc = A. ■ 


Lemma 47 Let C d B be a class of non-empty closed sets. We assume the following gen¬ 
eralized stability: If B ^ B and € C form a connected subgraph of G \ \ jC): 

Ai AL B Mi <k ^ A AL B. 

Then Cjj_(C) is AL-intersection additive. Furthermore the monotone closure Cjj_(C) is 

Cjj_ (C) := { Cl U C 2 U ... I Cl, C 2 ,... € C and the graph G^ ({C*!, C 2 ,...}) is eonneeted } 

Proof of Lemma I47t Let ^ = Ci U ... U C^, ^4' = C( U ... U C^, G C(C) with A oo A'. If 
for all j < n' we had Cj _1L A then by assumption A' AL A and therefore there has to be 
j < n' with Cj OD A. By the same argument there then is i < n with Cj od Cj. Therefore 
the intersection graph on Ci,..., Cn, C[,..., C^/ is connected and 

^ u = Cl u... u c„ u c( u... u c;, G C(C). 

Let B G C(C) and Ai,A 2 ,... G C(C) with A^ f B. Then for all n we have A^ = 
Cni U ... U Cnk{n) with C„j G C and their intersection graph is connected. Since An C ^n+i 
for all Cnj there is j' with Cnj C C^n+i),j' which even gives C„j od C^n+i)j'- Hence, the 
family {Cnj}n,j being countable can be enumerated Ci, C 2 ,... s.t. for all m there is i{m) < m 
with Cm ® Ci(^m)- Therefore for all m, the intersection graph on Ci,..., Cm is connected 
and hence 

:= Cl U . . . U Cm G C(C). 

And we see that |J^ Am G C(C) and therefore 

B = [jAn = [jCnj = (jCm eCiC). 
n nj m 

Now let B G C(C) and B = with Cn G C and s.t. the intersection graph on 

Cl, C 2 ,... is connected. By Zorn’s Lemma it has a spanning tree. Since there are at most 
countable many nodes, one can assume that this tree is locally countable and therefore 
there is an enumeration of the nodes C„(ip C„( 2 )) • • ■ s-t. they form a connected subgraph 
for all m. Then the intersection graph on C„(i),..., C„(m) connected for all m and 
therefore Am '■= Cn(i) U ... U Cn{m) ^ C(C). Am G Ci{C) t H is monotone and we have 

B = \JAmeCdn- ■ 

Proposition 48 Let C C B be a class of non-empty, closed events and L a C-separation 
relation. We assume the following generalized eountable stability: If B d B and Ai, A 2 ,... G 
C form a connected subgraph ofG±{C): 

An L B Vn An T B. 

n 

Then L is a C±{C)-separation relation. 
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Proof of Proposition [48} Set A := C_l. The assumption assures ^-stability. We have to 
show ^-connectedness. So let A G ^ and Bi,..., Bj- & B closed with: 

Ac BiU...UBk. 

By definition of C there are Ci,..., Cn G C with A = CiU.. .UCn and s.t. the T-intersection 
graph on {Ci, ... ,Cn} is connected. For all j < n we have Cj C A C Bi U ... U Bf^ and 
by C-connectedness there is i{j) < k with Cj C Now, whenever i{j) ^ f(j') since 

OD Byjy we have by monotonicity Cj od Cj/. So whenever there is an edge between Cj 
and Cj! then i[j) = i{j'). This means that i[-) is constant on connected components of the 
graph, and hence on the whole graph. ■ 

Proposition 49 Let C C B be a class of non-empty, closed events and T a C-separation 
relation with the following alternative C±{C)-stability: For all Ai, A 2 ,... G C and B G B: 

Q II ({Ai, A 2 ,...}) is connected and for all n: An T B |^A„Ti?. (28) 

n 

Then L is a C±{C)-separation relation and Cj_(C) is T-intersection additive. 

Assume furthermore _1L is a weaker relation (B T B' B _LL B'). Then T is a 

C II (C)-separation relation and Cjj_(C) is AL-intersection additive. 

Proof of Proposition 1491 The first part is a corollary of Lemma 07] and Proposition HSl 
For the second part observe Cjj_(C) C C_l(C). hence T is also a Cjj_ (C)-separation relation. 
But now Cjj_(C) is only _LL-intersection additive. ■ 

Proof of Proposition 1261 First if A„ f B £ A then for all x,x' £ B there is n with 
x,x' £ An and since A„ is path-connected there is a path connecting x and x' in A„ C B, 
so they are connected also in B. 

Let O be open and path-connected. Let {An)n C A! be the subsequence of all A G A' 
with A C O. Since O is open and A! a neighborhood base O = Consider the 

graph on the (A„)„ given by the intersection relation. Then by Zorn’s Lemma there is a 
spanning tree, and we can assume that it is locally at most countable. Therefore there is 
an enumeration A'^, A' 2 ,... such that {A'^,..., A(^} is a connected sub-graph for all n. By 
intersection-additivity hence A„ := A'^ U ... U A(j G A and A„ f O. ■ 

Lemma 50 Let pL £ and assume there is a B £ /C(/i) with dP = 1 b dp,. Assume 

that [A, , A-jf) is a P-subadditive stable clustering base and {Qn,Pn) f P is adapted. 

Then s{Fn) = {Af ,..., A^} consists only of roots and can be ordered in sueh a way that 
A1 C A? C .... The limit forest Poo then consists of the k pairwise A-^-separated sets: 

:= 

n>l 

there is a p-null set N £ B with 

B = Bi\a ... \j Bk U® N. (29) 
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Proof of Lemma I50t Once we have shown that all s{Fn) only consists of their roots, the 
rest is a direct consequence of the isomonotonicity, and the fact that there is a //-null set N 
s.t.: 

B = supp P = U supp Qn = Bi cf ... U N. 

n 


Now let ^4,^4' G Pn be direct siblings and denote by o =, a' < P their levels in Qn- Then there 
are a, a' > 0 with a = alAdfi and a' = dfi. Now, a, a' < P implies alA,a'lA' < Is 
(/i-a.s.) and hence a, a' < 1. Assume they have a common root Aq € maxP„, i.e. A U A' C 
Aq C B. Then aXA^olXa' < Iaq F 1b (//-a.s.) and hence they cannot be motivated. ■ 


Proof of Lemma 1271 The Hausdorff-dimension is calculated in (Falconer, 19931 . Corollary 
2.4). Proposition 2.2 therein gives for all events B d C and B' C C: 


P'*(</?(P)) < c|P"(P) and W{^-^{B')) < c\W{B'). 


We show that C is a P^-support set. Let B' C C be any relatively open set and set 
B := (p~^{B') C C . Then P C C is open because </? is a homeomorphism. And since C is a 
support set we have 0 < P*(P) < oo. This gives 

0 < P*(P) = n^{^-^{B')) < c^P"(P') and P'*(P') = n^{ip{B)) < c|P'*(P) < oo. 


Therefore C is a P*-support set. ■ 

Proof of Proposition 1281 The proof is split into four steps: (a). We first show that for 
all A G ^ there is a unique index i{A) with A G To this end, we fix an A G A. 

Then there is i < m with A G A*. Let // G Q* be the corresponding base measure with 
supp/i = A. Let j <m and fi' G be another measure with supp/i^ = A. Then //(A) = 1 
and {A) = 1. If j > f then by assumption // ^ /i' and this would give //^(A) =0. 11 j < i 
we have ^ and this would give //(A) = 0. So i = j. 

(h). Next we show that for all A, A' G A with A C A' we have i(A) < i(A'). To this 
end we first observe that A = A n A' = suppQ^ n suppQA'- If we had i > j then 

Qa' € -< 3 Qa and since Qa'IA) < Qai{A') = 1 < oo we would have Qa(A) = 0. 

Therefore i < j- 

(c) . Now we show that T is a stable A-separation relation. Clearly, it suffices to show A- 
stability and A-connectedness. The former follows since i(A„) is monotone if Ai C A2 C ... 

by (b) and hence eventually is constant. For the latter let A G A* and Bi,..., B^ G B 
± ± 

closed with Ac Pi U ... U P^. Then since T is an A*-separation relation there is j < k with 
ACB,. 

(d) . Finally, we show that (A, Q, T) is a stable clustering base. To this end observe that 

fittedness is inherited from the individual clustering bases. Let A G A* and A' G A^ with 

A C A'. Then i < j hy (b). 11 i = j then flatness follows from flatness of A*. If i < j then 
by assumption Qa -< Qa' and because Qa{A) = 1 < 00 we have Qa'{A) =0. ■ 

Proof of Proposition 1291 (a). Let 0 < P be a base measure on A G A*. If f = 
1 then (5 a(A n SUPPP2 ) < Qa(A) = 1 and by A^ -< P 2 we have Qa -< P 2 and hence 
P2(A n SUPPP2) = P 2 {A) = 0. Now for all events C G A'^ therefore a(C') = 0 < Pi(C') and 
for all C C A: 

a(C) < P{C) = aiPi(C) + a2P2{C) = aiPi(C). 
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Now if i = 2 then by assumption Pi ^ o and since 0 < Pi (A PI suppPi) < oo we 
therefore have a{A n suppPi) < o(suppPi) = 0 and for all events C C \ suppPi we have 
a(C) < P(C') = q:2P2(C) and for all events C C suppPi: 

a(C') < ci(suppPi) = 0 < Pi(C'). 

(b). Let 0, a' < P be base measures on A € A* and A € with A cjd _4 A'. By the 
previous statement we then already have a < aiPi and a! < ajPj. Now, if i = j then by 
Pj-subadditivity of A* there is a base measure b < P* < P on P € A* with B D A U A'. 

Now if i A J consider say i = 2 and j = 1. Since AnsuppP2 D An A' A 0 tiy assumption 
0 can be majorized by a base measure o < P2 on A G A^ with suppPi C A and a > a. The 
latter also gives A C A and hence o supports A and supp Pi D supp a! and a > a. ■ 
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Appendix A. Appendix: Measure and Integration Theoretic Tools 

Throughout this subsection, 11 is a Hausdorff space and B is its Borel u-algebra. Recall that 
a measure ^ on P is inner regular iff for all A G P we have 


/r(A) = sup { /r(A) | A C A is compact } . 


Gohn 


A Rad on space is a topological space such that all finite measures are inner regular 
(1201 ,i Theorem 8.6.14) gives several examples of such spaces such as a) Polish spaces, 
i.e. separable spaces whose topology can be described by a complete metric, b) open and 
closed subsets of Polish spaces, and c) Banach spaces equipped with their weak topology. In 
particular all separable Banach spaces equipped with their norm topology are Polish spaces 
and inhnite dimensional spaces equipped with the weak topology are not Polish spaces 
but still they are Radon space s. Furthermor e Hausdorff measures, which are considered in 
Section [4.31 are inner regular ( Federer . 19691 . Cor. 2.10.23). For any inner regular measure 
fj, we define the support by 


supp/i :=H\|^{OcH|Ois open and fi(0) = 0 }. 

By definition the support is closed and hence measurable. The following lemma collects 
some more basic facts about the support that are used throughout this paper. 


Lemma 51 Let fi be an inner regular measure and A £ B. Then we have: 

(a) If A ±0 suppfi, then we have fr{A) = 0. 

(b) If tb A A CL suppp is relatively open in supp/i, then /i(A) > 0. 
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(c) If fi' is another inner regular measures and a, a' > 0 then 

supp(a// + a'fi') = supp(/i) U supp(^') 

(d) The restriction o//x to A defined by /i|^(i?) = fj,(Br\A) is an inner regular measure 
and supp C A H supp /U. 

If ^ is not inner regular, (d) also holds provided that is a Radon space and n{A) < oo. 

Proof of Lemma I5H (a). We show that ^ := O \ supp/i is a /i-null set. Let K (Z Ahe 
any compact set. By definition A is the union of all open sets O C 0 with //(O) = 0. So 
those sets form an open cover of A and therefore of K. Since K is compact there exists a 
finite sub-cover {Oi,..., On} of K. By fi-subadditivity of p, we find 

n 

i=l 

and since this holds for all such compact K C A we have by inner regularity 

IJ.{A) = sup pl{K) = 0. 

KcA 

(b) . By assumption there an open O C with 0 / A = O fl supp/x. Now O ft supp/x ^ 0 
implies /x(0) > 0. Moreover, we have the partition O = (0\supp /x) and since 0\supp /x 
is open, we know fj,{0 \ supp/x) = 0, and hence we conclude that /x(0) = /x(^). 

(c) . This follows from the fact that for all open O C we have 

{apL + a pl){0) = apL{0) + a g!{O) = 0 <;=> t{0) = 0 and pL {O) = 0. 

(d) . The measure is inner regular since for B & B we have 

p!{B) = sup { p{K') \ K' Z B r\ A\s compact } < sup { p'{K') \ K' Z B is compact } 

< h\B). 

Now observe that X \ ^ n supp p Z X \ {Ar\ supp p) = (X \ ^) U (X \supp p). For the open 
set O := X \ ^4 n supp p we thus find 

P\a{0) < P\a{X \ -4) + /X|^(X \ supp/x) < p{X \ supp/x) = 0. ■ 


Lemma 52 Let Q, Q' be a-finite measures. 

(a) If Q and Q' have densities h,h' with respect to some measure p then 

Q <Q' h <h' p-a.s. 
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(h) If Q < Q' then Q is absolutely continuous with respect to Q', i.e. Q has a density 
function h with respect to Q', dQ = hdQ' such that: 


h{x) 


G [0,1] if X ^ supp Q' 

0 else 


Proof of Lemma I52t (a). a direct calculation gives 

Q{B) = f hdfi< f h'dyi = Q'{B). 

Jb Jb 

and monotonicity of the integral. 

For assume /r({x : h{x) > h'{x)}) > 0, then 

[ hdpL = Q{{h > h'}) < Q'{{h > h'}) = [ h'dp. < [ hdn, 

Jh>h' Jh>h' Jh>h' 

where the last inequality holds since we assume p.{{x : h{x) > h'{x)} > 0 and again the 
monotonicity of the integral. Through this contradiction implies the statement. 

(b). Q < Q' means every Q'-null set is a Q-null set. Furthermore since Q' is u-finite 
Q is (T-finite as well. So we can use Radon-Nikodym theorem and there is a /i > 0 s.t. 
dQ = hdQ'. Since the complement of suppQ^ is a Q'-null set, we can assume h{x) = 0 on 
this complement. 

We have to show that h < 1 a.s. Let 

En := {h > 1 + i} and E := {h > 1}. 

Then En t E and we have 

Q'{En) > Q{En) = [ hdQ'>{l + E)- Q'{En), 

J En 

which implies Q'{En) = 0 for all n. Therefore Q'{E) = lim„ Q'{En) =0. ■ 


Lemma 53 (a) Let Qn f P, A := suppP and B := Then B C A and 

P{B \A) = 0. 

(b) Assume Q is a finite measure and Qi < Q 2 < ■ ■ ■ < Q CLnd let the densities hn := 

Then /ii < /i 2 < ... < 1 Q-a.s. Furthermore, the following are equivalent: 

(l) QnfQ 

(ii) hnf I Q-a.s. 

(Hi) /i„ t 1 in . 

Proof of Lemma 1531 

(a) Since Qn < P we have suppQri C A and therefore B d A. Because of {A\B) f] 
supp Qn = 0 and the convergence we have for all n 

PiA\B) = lim QniA\B) = 0. 
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(b) By the previous lemma we have hi < /12 < • < 1 Q-a.s. 

(i) (ii): Since (hn)n is monotone Q-a.s. it converges Q-a.s. to a limit h < 1. Let 

En ■= {h < 1 — and E := {h < 1}. 


Then En t E and we have by the monotone convergence theorem: 



But since Qm{En) tm Q{En) this means that Q{En) = 0 for all n and therefore 
Q{E) = lim„ Q{En) = 0. 

(ii) (iii): This follows from monotone convergence, because 1 € L^{Q). 

(hi) (i): For all B & B: 



because of —)■ 1 in 
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